Xilinx ISE ver 8.2.02i is optimizing away and removing "redundant" logic

jame...@yahoo.ca

unread,

Sep 9, 2006, 2:39:19 PM9/9/06

to

Hello All,

I am writing a VHDL design for a Xilinx FPGA using
ISE ver 8.2.02i (8.2 SP 2) and I'm trying to get
post-map simulating correctly now that I have
post-translate simulating correctly. I put "keep"
attributes on every single signal, including those
in the ports (editor keystroke macros help a lot).
I also ran the following command line in a command
(DOS) window: (just run the following three lines
together with spaces at the line breaks; it is a
copy from the DOS window): Note the added -u to
try to prevent logic removal, which is why I had
to run this in the command window; it's not
available as a setting in map properties.

map.exe -ise ppcaesh.ise -intstyle ise -p xc4vfx12-ff668-10 -cm speed
-detail
-pr b -k 4 -c 100 -u -o user_logic_map.ncd user_logic.ngd
user_logic.pcf

I have the map optimization now set for speed in
my project, reflected in this command. I tried
setting "no optimization" as well as all the other
optimization choices. The above command made the
post map files for me. Now, at least all the red
is gone from Post-Map simulation and some of the
bytes are right in my first section of output. I
think this is due to all the "keep" attributes I
added. My removed "redundant logic" list was made
a little smaller. Here is the syntax for "keep":

signal mysignal : std_logic; -- declare a signal

attribute keep : string; -- you just need this once
-- then you can do the next line for each signal you want.
attribute keep of mysignal : signal is "true";
-- this goes in the architecture section after the signal
declarations and
-- before the "begin".

(Thanks to the thread "Looking for ways to keep
diagnostic signal from being optimized out
(Xilinx)" here in comp.arch.fpga)

This syntax is found in the Constraints Guide,
cgd.pdf.

Here is a partial list of the removed logic:
Section 5 - Removed Logic
-------------------------
Optimized Block(s):
TYPE BLOCK
GND XST_GND
VCC XST_VCC

Redundant Block(s):
TYPE BLOCK
LOCALBUF u0/my_sub_mod_128_0_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_10_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_12_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_11_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_13_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_14_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_15_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_16_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_17_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_18_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_19_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_1_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_20_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_21_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_22_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_23_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_24_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_25_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_26_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_27_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_28_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_29_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_2_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_30_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_31_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_3_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_4_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_5_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_6_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_7_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_8_xo<1>1/LUT3_D_BUF
LOCALBUF u0/my_sub_mod_128_9_xo<1>1/LUT3_D_BUF
LUT1 myrst_inv1
LUT1 dcnt_Msub__sub0000_xor<0>11
INV Bus2IP_Clk_inv_INV_0
LOCALBUF Mxor__xor0019_Result1/LUT3_L_BUF
LOCALBUF Mxor__xor0055_Result1/LUT3_L_BUF
LOCALBUF Mxor__xor0127_Result1/LUT3_L_BUF
LOCALBUF Mxor__xor0083_Result1/LUT3_L_BUF
LOCALBUF user_logic_010_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_012_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_014_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_018_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_019_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_01_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_020_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_022_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_023_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_024_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_028_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_02_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_032_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_033_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_035_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_036_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_039_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_03_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_040_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_043_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_044_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_045_xo<2>1/LUT4_L_BUF
LOCALBUF user_logic_046_xo<2>1/LUT4_L_BUF
etc... (93 more lines)

As you can see from this list from the file
user_logic_map.mrp in the project directory, there
is still logic being removed. The optimized blocks
are still removed if you set "no optimization" and
the "redundant" blocks are still being removed
even with -u "Do Not Remove Unused Logic" command.
Could we have a "Do Not remove Any Logic" option?
-and have the "no optimization" setting respected
fully (when set)?

The key, I have learned, is to use the correct
Xilinx VHDL style, which is different for FPGAs
and ASICs. Once you follow that, you won't have
any more problems. Can someone advise me on this
correct syntax from this list of "optimized" and
"redundant" logic? Meanwhile I am reading the
xst.pdf manual, section on VHDL style to try some
things. The style to avoid latches I already used,
which really worked. Also the style to clock ROMs
so that they won't be optimized away as
asynchronous RAMs I already used, which did the
trick in post- translate.

Best regards,
-James

KJ

unread,

Sep 9, 2006, 4:21:00 PM9/9/06

to

I guess the most obvious question I would have for you first off is "Why are
you bothering with this?" Let the tool do it's job which is to turn
VHDL/Verilog design source files into the properly formatted bitstream
needed to program the device.

Don't waste your time trying to prevent the tool from optomizing your
code...trust me even the best code written by an experience person can be
optomized to target a specific device.

I realize that doesn't address your question, just thought I'd save you what
seems to me to be a fruitless exercise on your part.

KJ

jame...@yahoo.ca

unread,

Sep 9, 2006, 7:36:53 PM9/9/06

to

The reason I'm doing this is implied in my third paragraph:

"Now, at least all the red is gone from Post-Map simulation
and some of the bytes are right in my first section of output."

In other words I'm doing this because it wasn't working,
according to the simulator, and
this improved matters significantly but not completely. The
"red" I refered to was used by my simulator, ModelSim III XE 6.1e
starter edition, to indicate unknown output values, I think, since
X's for unknowns appeared post-map. They no longer appeared
after I did the above and then some but not all of my output was
actually correct.

In my first sentence I wrote "I'm trying to get post-map simulating

correctly now that I have post-translate simulating correctly."

In other words I'm trying to get the post-map design correct as
shown by the simulator, because it isn't. I got the behavioural
simulation showing correct logic and I got the post-translate
simulation to show correct operation by clocking my ROMs so
they wouldn't be optimized away.

Do you have any ideas to prevent "redundant logic" from being
removed? I've been told the key is in using different VHDL
coding style. I'm also going to look into putting "Save" on all
my nets. This is a constraint, according to dev.pdf, the Development
System Reference Guide, so I will look it up in the Constraints
Guide, cgd.pdf. "Keep Hierarchy" might help.

I suspect this is the problem because post-translate I was having
my ROMs inferred as RAMs and optimized away, and, as I
wrote in my third paragraph, "My removed "redundant logic" list
was made a little smaller," and operation was significantly
improved, as indicated in that paragraph. I figure that restoring
more removed logic might do the trick. It certainly looks like the
thing to try.

Does anyone have some samples of VHDL code before and after
that were interpreted as redundant and then weren't after being
changed?
Best regards,
-James

KJ

unread,

Sep 10, 2006, 11:29:36 AM9/10/06

to

<jame...@yahoo.ca> wrote in message
news:1157845013.7...@b28g2000cwb.googlegroups.com...

> The reason I'm doing this is implied in my third paragraph:
> "Now, at least all the red is gone from Post-Map simulation
> and some of the bytes are right in my first section of output."
> In other words I'm doing this because it wasn't working,

And again, I'll point out that this going about the task of 'getting it
working' by trying to disable synthesis optomizations is the wrong approach
and will likely be wasted effort.

> according to the simulator, and
> this improved matters significantly but not completely. The
> "red" I refered to was used by my simulator, ModelSim III XE 6.1e
> starter edition, to indicate unknown output values, I think, since
> X's for unknowns appeared post-map. They no longer appeared
> after I did the above and then some but not all of my output was
> actually correct.

But that means nothing until you understand why the optomized result was
'incorrect'. Remember that optomization does not involve changing the
overall function, just the implementation of that function. I have no doubt
that if you were to somehow disable every possible optomization that it
might be possible to have it emulate the code that you originally
wrote...but I'll content that it still won't work for you in a real device
either.

>
> In my first sentence I wrote "I'm trying to get post-map simulating
> correctly now that I have post-translate simulating correctly."
> In other words I'm trying to get the post-map design correct as
> shown by the simulator, because it isn't.

While it's possible (but not terribly probable) that there is a bug in the
synthesis tool the more likely explanation is in the source code that you
wrote. Synthesis to actual FPGA/CPLD/ASIC produces an output model that is
strictly std_logic/std_ulogic based...there are no 'enumerated types',
'integers', etc. Those output models also model expected propogation delays
that will exist in the actual device. That being the case, here are the
things to look for and how to go about looking for them.

- Peruse the synthesis report for warnings. If it runs across code that is
valid but is not well synthesized there will usually be a warning (the
classic example being the latch, signal 'initialization' values being
another one). Comb through those warnings and fix them.
- Peruse the timing report for timing conditions. Timing analysis produces
five basic numbers: setup time (for each input relative to the clock that
samples it), hold time (for each input relative to the clock that samples
it), clock to output delay, propogation delay (for pure combinatorial input
to output paths) and clock frequency. Now go back to the code for your
testbench and make sure that you are
- Not violating setup or hold time.
- Not violating clock frequency
- Not blaming the post route model when you look at outputs going to 'X'
at a time that is still within the clock to output delay or propogation
delay.
- Peruse your source code for ANY usage of a data type other than std_logic
or std_ulogic. Enumerated types and integers are not illegal and can easily
be synthesized but they are susceptible to misuse. The misuse comes about
because in the simulation environment signals/variable of those types will
get 'magically' initialized...there is typically no such magic in a real
device. You can write code for a counter using type 'natural' that will
simulate just fine but when that 'natural' is translated into 'std_logic' as
it must be to be synthesized the output model will not 'work' and will sit
there as an unknown value because the original code had nothing to reset it
to a known value.

Those are the tools you need to debug your problem. Disabling optomizations
is not in that list and will only lead you down a path that will result in
your final design not working anyway.

KJ

Weng Tianxiang

unread,

Sep 10, 2006, 12:14:07 PM9/10/06

to

Hi,
1. I agree fully with KJ who is an experienced author in this group.
2. I never do any post-map simulating with all 6-8 projects I have
finished individually in Xilinx FPGA and all of them go to market
successfully.
3. While doing simulation, just check if logic design is correct, don't
have to check timing.
4. Let Xilinx compiler determine if the project meets its timing:
a. setup timing;
b. holding timing;
c. running frequency;
If Xilinx ISE tells there is no timing violation in the above 3
catagories, put the design in a chip, then test the board to see if
there is any logic design error.

Never spend time doing post-map simulation;
Never spend time using DOS command lines;
Never spend time turning off Xilinx's optimization;

Weng

jame...@yahoo.ca

unread,

Sep 10, 2006, 7:25:10 PM9/10/06

to

Thanks for your replies, KJ, Weng. It is clear to
me that disabling optimization is not the real way
to fix my problem, but that using the coding style
that Xilinx likes is the way. Once that is
followed, no more problems will occur. I am just
not looking forward to the painful process of
trial and error that I have read will be required
by first-time Xilinx users to get the right coding
style. I have looked over the xst.pdf manual for
coding style. Removing all the latches was very
helpful when I did that prior to posting here,
back at my translate stage.

I am using only "std_logic". I will check my
synthesis report for warnings. I have no timing
violations listed as of this stage: post-map. I am
not at post-PAR yet. Why should I take the time to
place and route when post-map simulation doesn't
even work? I think doing that is for experienced
users who don't have trouble with the earlier
stages.

I am doing a lot of simultaneous "xor"s of
different bit ranges of 128-bit "words" and using
a function that uses a function (i.e.,
combinatorial logic) and I'm doing that
simultaneously as input to signals that are then
"xor"-ed. These are done after each clock cycle,
when initial signals are updated. That is, when
these initial signals are updated in a process at
the rising edge of my clock, then I have
additional signals that should just be updated
because the data has changed. I'm not using any
sensitivity list or any clock cycle for them.
These assignments should cause signals to change,
which cause the next set of signals to change, in
about three steps, with ranges of bits being
processed in parallel (and mixed, which is why I
have to get into bit ranges). Finally, signals
named "_next" are updated, then the next clock
cycle is awaited at which time the original
signals are updated from the "_next" signals.
Based on my experience so far in which I got into
trouble at the synthesize and translate stage due
to not having a clock on my ROM, do you think
putting clocks on everything would be the thing to
try? This is the trial and error that I will have
to go through.

Do you have any samples of this kind of VHDL code
that Xilinx likes, that you could show me?

Best regards,
-James

ankyag

unread,

Sep 10, 2006, 8:51:26 PM9/10/06

to

> Do you have any samples of this kind of VHDL code
> that Xilinx likes, that you could show me?
>

In my experience with Xilinx and/or other FPGAs, the only kind of HDL
that these tools "don't like" are non-synthesizable constructs. For e.g
"a<= a+1", where "a" is not a latched signal, or the "initial" blocks
in verilog. So the best thing to do would be to read up on synthesis in
any standard HDL reference book pretty quickly. My suspicion is that
you haven't paid attention to this while writing the code.

The other possibility might be that you are setting your clock
frequency too high which causes some setup/hold time violations and
gives you all those "reds".

Best,
ankyag

Weng Tianxiang

unread,

Sep 10, 2006, 10:25:04 PM9/10/06

to

Hi ankyag,
I widely use the equation like:
a <= a +1;

Usually a is an unsigned (or std_logic_vector) for a counter, it
doesn't matter whether the equation is in a process or in a concurrent
area.

No any problem.

I don't see why VHDL dislikes it or it cannot be synthesized.

Weng

ankyag

unread,

Sep 11, 2006, 12:04:05 AM9/11/06

to

> I widely use the equation like:
> a <= a +1;
>
> Usually a is an unsigned (or std_logic_vector) for a counter, it
> doesn't matter whether the equation is in a process or in a concurrent
> area.
>
> No any problem.
>
> I don't see why VHDL dislikes it or it cannot be synthesized.

Sorry if my previous comment confused you. All I meant was it cannot be
used as a concurrent assignment (in vhdl) or in the "assign" statement
in verilog. It is okay to use it within a "process" or "always" block.
In the latter case, a latch/flip-flop is inferred.

Hope this clears the confusion,
Ankur

David Ashley

unread,

Sep 11, 2006, 2:25:22 AM9/11/06

to

Weng Tianxiang wrote:
> Never spend time doing post-map simulation;
> Never spend time using DOS command lines;
> Never spend time turning off Xilinx's optimization;

Weng,

Can you clarify the 2nd one about "DOS command lines"?
I'm using xilinx webpack tools under linux, operating
from the command line. Actually I've built up a Makefile
that invokes the commands. Is there some gotcha I need
to know about? I prefer command line tools operated by
"make" as opposed to IDE's.

Below's the important pieces of the Makefile. The commands
I got from the pacman source build script, converted to unix
make syntax. Works fine.

-Dave

XILINX=/Xilinx
NAME=main
SETUP=LD_LIBRARY_PATH=$(XILINX)/bin/lin XILINX=$(XILINX) \
PATH=$(PATH):$(XILINX)/bin/lin

bitfile: step0 step1 step2 step3 step4 step5

step0:
$(SETUP) xst -ifn $(NAME).scr -ofn $(NAME).srp
step1:
$(SETUP) ngdbuild -nt on -uc $(NAME).ucf $(NAME).ngc $(NAME).ngd
step2:
$(SETUP) map -pr b $(NAME).ngd -o $(NAME).ncd $(NAME).pcf
step3:
$(SETUP) par -w -ol high $(NAME).ncd $(NAME).ncd $(NAME).pcf
step4:
$(SETUP) trce -v 10 -o $(NAME).twr $(NAME).ncd $(NAME).pcf
step5:
$(SETUP) bitgen $(NAME).ncd $(NAME).bit -w #-f $(NAME).ut
hwtest:
sudo xc3sprog $(NAME).bit

-----
main.scr contains this:

run
-ifn main.prj
-ifmt VHDL
-ofn main.ngc
-ofmt NGC -p XC3S500E-FG320-4
-opt_mode Area
-opt_level 2

------
main.prj just lists the vhd source files.

--
David Ashley http://www.xdr.com/dash
Embedded linux, device drivers, system architecture

KJ

unread,

Sep 11, 2006, 5:58:57 AM9/11/06

to

"Weng Tianxiang" <wtx...@gmail.com> wrote in message
news:1157941504.5...@i42g2000cwa.googlegroups.com...

> I widely use the equation like:
> a <= a +1;
>
> Usually a is an unsigned (or std_logic_vector) for a counter, it
> doesn't matter whether the equation is in a process or in a concurrent
> area.

It matters very much whether it is in a clocked process or not. If 'a<=a+1'
is in an unclocked process or concurrent statement you've just created a
latch. As a general guideline if you ever have a signal on both sides of
the '<=' in an area outside of a clocked process you've got a latch.

> No any problem.

I doubt that. a<= a+1 outside of a clocked process will (at best) produce a
counter that increments by one at whatever uncontrolled propogation delay of
the device you have

>
> I don't see why VHDL dislikes it or it cannot be synthesized.

It can be synthesized....it just is highly unlikely to do what you want it
to do.

KJ

unread,

Sep 11, 2006, 6:22:50 AM9/11/06

to

"Weng Tianxiang" <wtx...@gmail.com> wrote in message

> Never spend time doing post-map simulation;

I wouldn't necessarily recommend this. What you need to do is first get
your experience up to a level where post-route simulation reveals no
surprises. The way to get that experience is to do a few of these in the
first place and then get a feel for where your code is not quite up to
snuff.

As an example, it's possible to get all the way through the build process
and have no errors or warnings and have it still not work simply because you
used a 'natural' to build a counter instead of unsigned (see my previous
post for more on when this can be a problem). In the hands of an
experienced designer, use of 'natural' can be better than 'unsigned';
outside of those experienced hands is quite a different story.

There are also times (like ASIC designs) or contracting where post-route sim
is required as a check off item that needs to be completed.

In any case, in the hands of an experienced designer doing an FPGA,
post-route sim can typically be skipped as you suggest but as a general rule
no it can not. In fact, in the case of the original poster of this thread,
the post-route simulation is waving the big red flag indicating that there
is something wrong either with his design or testbench.....that's a good
thing, better to find out sooner rather than later....the problem is that
instead of simply debugging to find the cause of the problem he seems to
want to flip whatever build time switches are available to make the problem
somehow disappear.

KJ

Weng Tianxiang

unread,

Sep 11, 2006, 9:12:46 AM9/11/06

to

Hi KJ,
No, I disagree with you about that it would generate a latch.

Actually it is a combinational logic. Through one of Xilinx tools, you
can check that it just generates combinational logic. That is all.

No latch would be generated if 'a <= a+1;' is in a concurrent area and
even in a process without clock. I don't know Verilog, but know well
about VHDL.

Counter a with 'a <= a+1;' means same thing as a variable in a process,
but different in simulation: 'a' can be reviewed in simulation ModelSim
on every clock like any signal, but cannot be seen if it is a variable.

Weng

KJ

unread,

Sep 11, 2006, 9:57:31 AM9/11/06

to

Weng Tianxiang wrote:
> >
> > It can be synthesized....it just is highly unlikely to do what you want it
> > to do.
> >
> > KJ
>
> Hi KJ,
> No, I disagree with you about that it would generate a latch.
>
> Actually it is a combinational logic. Through one of Xilinx tools, you
> can check that it just generates combinational logic. That is all.

You're right, it wouldn't be a typical latch but "a<=a+1" is a form of
combinatorial feedback (i.e. there is a combinatorial path from 'a'
back to itself) which while not really a latch is a form that one must
almost always avoid (no 'almost' inside FPGAs though). In any case
"a<=a+1" would be pretty useless if instantiated in a concurrent area.
What I also said was...

> > I doubt that. a<= a+1 outside of a clocked process will (at best) produce a
> > counter that increments by one at whatever uncontrolled propogation delay of
> > the device you have

Think about it. For starters, how does 'a' get initialized to
anything? Ignoring that for the moment and assuming that 'a' was
somehow magically '0', at some time. Then the logic would be trying to
update a to be '1' (by virtue of the a<= a+1). But now a is '1' and
will want to be updated to be '2', and then '3', '4', etc. All seems
well, it's a counter after all....But since this is not in a clocked
process then 'a' would be changing at whatever propogation delay there
is in computing 'a+1'....which is useless. In a real device those
outputs probably wouldn't even resemble a counter either. In any
simulation environment you'll error out with an iteration limit error
because signal 'a' will never settle down (again assuming that it ever
got to be defined in the first place).

>
> Counter a with 'a <= a+1;' means same thing as a variable in a process,
> but different in simulation: 'a' can be reviewed in simulation ModelSim
> on every clock like any signal, but cannot be seen if it is a variable.

What clock? You said we're in a concurrent statement!

I think what you really mean to say is....

b <= a+1; -- But this is updating a new signal called 'b', not 'a'.
process(clock)
begin
if rising_edge(clock) then
a<=b;
end if;
end process;

where the 'b<=a+1' is the concurrent statement.

Personally though, I would've written it as

process(clock)
begin
if rising_edge(clock) then
a<=a+1;
end if;
end process;

But in either case, we're talking about the counter being implemented
in a synchronous process, "a<=a+1" in a concurrent statement won't
work.

KJ

Weng Tianxiang

unread,

Sep 11, 2006, 12:01:29 PM9/11/06

to

Hi David,
I never use DOS commands and all options are accessable through Xilinx
ISE window system so that I don't know how to answer any questions
about it.

Weng

Weng Tianxiang

unread,

Sep 11, 2006, 12:06:06 PM9/11/06

to

Hi KJ,

b <= a+1; -- But this is updating a new signal called 'b', not 'a'.
process(clock)
begin
if rising_edge(clock) then
a<=b;
end if;
end process;

where the 'b<=a+1' is the concurrent statement.

You are right.

Weng

David Ashley

unread,

Sep 11, 2006, 1:20:00 PM9/11/06

to

Weng Tianxiang wrote:
> Hi David,
> I never use DOS commands and all options are accessable through Xilinx
> ISE window system so that I don't know how to answer any questions
> about it.
>
> Weng

Aha! So your list was just describing your own way of development,
it wasn't meant as advice as to how best to do development.

Thanks--
Dave

jame...@yahoo.ca

unread,

Sep 11, 2006, 1:46:45 PM9/11/06

to

Hi All,

What do people think of my idea from my post of
Sun, Sep 10 2006 7:25 pm? I have a description of
what I am doing, followed by a question:

I am trying this. Does anyone have some (simple to
them) samples of VHDL code along these lines that
succeed in a Xilinx FPGA?

Best regards,
-James

KJ

unread,

Sep 11, 2006, 3:33:59 PM9/11/06

to

> >I am doing a lot of simultaneous "xor"s of
> >different bit ranges of 128-bit "words" and using
> >a function that uses a function (i.e.,
> >combinatorial logic) and I'm doing that
> >simultaneously as input to signals that are then
> >"xor"-ed. These are done after each clock cycle,
> >when initial signals are updated. That is, when
> >these initial signals are updated in a process at
> >the rising edge of my clock, then I have
> >additional signals that should just be updated
> >because the data has changed. I'm not using any
> >sensitivity list or any clock cycle for them.
> >These assignments should cause signals to change,
> >which cause the next set of signals to change, in
> >about three steps, with ranges of bits being
> >processed in parallel (and mixed, which is why I
> >have to get into bit ranges). Finally, signals
> >named "_next" are updated, then the next clock
> >cycle is awaited at which time the original
> >signals are updated from the "_next" signals.

It sounds like relatively straightforward logic and registers. There
are absolutely no issues with using any design language to implement
this.

> >Based on my experience so far in which I got into
> >trouble at the synthesize and translate stage due
> >to not having a clock on my ROM, do you think
> >putting clocks on everything would be the thing to
> >try?

Not unless the outputs are required to be clocked for some other reason
no clock is required. A couple reasons why you might want to clock the
outputs would be...
- More consistent timing on when the outputs become available (i.e. the
clock to output delay generally doesn't change much if the final
outputs are clocked)
- No glitching on the outputs. The output of a flip flop will either
change or remain the same for the entire clock cycle whereas the output
of combinatorial logic implemented inside an FPGA might glitch during
the propogation delay while the new output value is being computed.

That doesn't imply that clocked are 'better' or 'worse' you just need
to be aware of what will come out. As another very general statement,
there is usually absolutely no need for internal signals to be clocked
except to improve clock cycle performance. Since you've provided no
information regarding what speed you need to run at, I'd say that there
is no speed issue at present.

>
> I am trying this. Does anyone have some (simple to
> them) samples of VHDL code along these lines that
> succeed in a Xilinx FPGA?

Not sure exactly what you were trying to describe but my interpretation
is that is something of the form...

y <= Fun1(Fun2(X1(100 downto 83)) xor X2(17 downto 0);

where Fun1 and Fun2 are your 'function of a function', X1 and X2 are
some inputs of some sort and you're Xor-ing them together. I'm sure I
didn't guess right, but on the off chance that it is correct then 'yes'
the above line of code will work just fine for what you're trying to
do. If you want a clocked output y then

process(Clock)
begin
if rising_edge(Clock) then
y <= Fun1(Fun2(X1(100 downto 83)) xor X2(17 downto 0);
end if;
end process;

All of this will work you need to sit down and write the logic
equations for whatever it is you're trying to implement there shouldn't
be any need for any trial and error.

KJ

jame...@yahoo.ca

unread,

Sep 11, 2006, 10:55:15 PM9/11/06

to

Thanks for your reply. I was skeptical myself.
I do have my equations and my VHDL code. Behavioural simulates
correctly, post-synthesis and translate simulates correctly, then
post-map fails simulation abysmally. About 140-150 lines reporting
removed "redundant logic" were reported
by the mapper, in addition to two lines indicating VCC and GND were
"optimized"
away (see copy of first lines of output in my first post). I suspect
that it is the
removed logic that is needed to make the simulation work, because when
I use
"keep" statements, the simulation improves greatly, but is still half
wrong at
an early stage. However, I don't want to use kludges, I want the tool
to
recognize it is all needed from the way I write the VHDL. Your sample
VHDL is
pretty much what I have; I just have a lot of simultaneous such lines
at each
timing period for different bit ranges going to different bit ranges.
What could
be causing everything to work fine at the first three stages and then
have the
post-map stage fail its simulation so badly? I certainly suspect all
the removed
"redundant logic" that the mapper is reporting. But how to indicate it
is not
really redundant, without using "keep" and "save" statements
everywhere? I
can't even complain that all my signals are connected (they
are), because that's not the problem: the mapper is not removing
"unused logic".

Best regards,
-James

KJ

unread,

Sep 12, 2006, 6:18:55 AM9/12/06

to

<jame...@yahoo.ca> wrote in message
news:1158029715.4...@d34g2000cwd.googlegroups.com...

> What could
> be causing everything to work fine at the first three stages and then
> have the
> post-map stage fail its simulation so badly?

A number of things....things I mentioned in earlier posts and see below for
a quick summary.

> I certainly suspect all
> the removed
> "redundant logic" that the mapper is reporting.

But you have no basis for that suspicion. It might be the case that the
synthesis process has a bug but you need to prove it.....and then open a
service request on the company that has the bug in it. My point is don't
let your objectivity be clouded by what you suspect, debug and prove.

> But how to indicate it
> is not
> really redundant, without using "keep" and "save" statements
> everywhere?

And this is where you start spinning your wheels (in my opinion). Instead
of simply debugging the post-map sim to the source of the discrepancy you're
trying things based on a suspicion that is not proven. Let's say for the
sake of argument that your suspicion is wrong about the removal of redundant
logic and that the problem is a timing issue with your testbench instead.
That would mean that every minute you spend chasing 'keep' and 'saves' etc.
was wasted time.

> I
> can't even complain that all my signals are connected (they
> are), because that's not the problem: the mapper is not removing
> "unused logic".
>

This will sound like a dumb question on my part but what is the distinction
in your mind between 'redundant logic' and 'unused logic'? The reason for
my confusion at this point would be because you say the 'redundant' stuff is
getting removed and yet there is no 'unused' logic getting removed. If by
'redundant' you mean the classical Boolean Logic 101 definition where you
add redundant logic to act as 'cover' terms in your Karnaugh map to avoid
race conditions then that is the most likely cause of your problems. Is
this the type of logic that you are trying to 'keep' but is being mapped
away as an 'optomization'? If it is, then the rest of this post probably
doesn't apply and we can discuss this point further, but if it is not then
keep on reading.

One other source of 'optomization' is that an output of some entity is not
really used. The logic for equation 'x' happens to reduce down to always
being 'false'. This means that everything downstream of 'x' that depends on
'x' being true can never happen so it can be optomized away. It's not the
fault of the optomizer removing redundant logic that's the way the original
is coded. You probably already realize this but thought a quick
'Optomization 101' wouldn't hurt....but I also don't think focusing on what
is being optomized is away is the way you need to go on this one (which is
the reason on my first post I questioned you "Why...").

What you need to do is to simulate the post-map VHDL file and trace it back
to why output signal 'x' at time t is set to '0' but when you use your
original code it is '1'. Use the sim results from using your original code
as your guide for what 'should' happen and the post-map VHDL simulation for
what is actually happen and debug the problem.

It could be that
- There is some bug in the translation tool
- Could be some setting in your build process
- Could be timing related (i.e. your testbench is violating the setup/hold
time requirements for the post-map model)
- Probably other things too

In any case, treat the fully post-map model as something to debug and find
out the reason for the discrepancy and go from there.

KJ

jame...@yahoo.ca

unread,

Sep 12, 2006, 11:34:10 AM9/12/06

to

KJ wrote:
> <james..yahoo.ca> wrote:

> > the post-map stage fails its simulation so badly

>
> > I certainly suspect all the removed
> > "redundant logic" that the mapper is reporting.
> But you have no basis for that suspicion. It might be the case that the
> synthesis process has a bug but you need to prove it.....and then open a
> service request on the company that has the bug in it. My point is don't
> let your objectivity be clouded by what you suspect, debug and prove.

I do have basis, as I wrote previously: using the "keep"
statements removed some lines of removed "redundant"
logic and dramatically improved success of the post-map
simulation. I would indeed, however, like to avoid using this as a
crutch and do it properly as you indicate. I would like to
find out the correct way to indicate in the VHDL, by the
way I write the VHDL, that the logic is not redundant.

>
> > But how to indicate it
> > is not
> > really redundant, without using "keep" and "save" statements
> > everywhere?
> And this is where you start spinning your wheels (in my opinion). Instead
> of simply debugging the post-map sim to the source of the discrepancy you're
> trying things based on a suspicion that is not proven. Let's say for the
> sake of argument that your suspicion is wrong about the removal of redundant
> logic and that the problem is a timing issue with your testbench instead.
> That would mean that every minute you spend chasing 'keep' and 'saves' etc.
> was wasted time.

I'm not really arguing for using those crutches; I'm seriously
asking what to do so that I don't need them.

>
> > I
> > can't even complain that all my signals are connected (they
> > are), because that's not the problem: the mapper is not removing
> > "unused logic".
> >
> This will sound like a dumb question on my part but what is the distinction
> in your mind between 'redundant logic' and 'unused logic'? The reason for
> my confusion at this point would be because you say the 'redundant' stuff is
> getting removed and yet there is no 'unused' logic getting removed. If by
> 'redundant' you mean the classical Boolean Logic 101 definition where you
> add redundant logic to act as 'cover' terms in your Karnaugh map to avoid
> race conditions then that is the most likely cause of your problems. Is
> this the type of logic that you are trying to 'keep' but is being mapped
> away as an 'optomization'? If it is, then the rest of this post probably
> doesn't apply and we can discuss this point further, but if it is not then
> keep on reading.

The "redundant" and "unused" logic terms I am copying from the mapper
report and Xilinx documentation. The mapper report (see my
first post) says "redundant" logic is being removed, not "unused
logic".
>From my reading of the Xilinx manuals I understand that "unused logic"
means logic that is not connected to anything, so it can be removed
(this latter is not what is happening to me).
However, I haven't found anything in the manuals that explains what
"redundant logic" is or how to write the code to avoid it. I have a lot
of identical ROMs that I use to do parallel processing; those were
being removed in the synthesis and translate step due to not having
clocks on them. So my mind is pretty much a blank as to what is
meant by "redundant" logic, other than the common meaning that it
is repetitive -- but it isn't really, of course, because I'm using them
simultaneously for different data.

>
> One other source of 'optomization' is that an output of some entity is not
> really used. The logic for equation 'x' happens to reduce down to always
> being 'false'. This means that everything downstream of 'x' that depends on
> 'x' being true can never happen so it can be optomized away. It's not the
> fault of the optomizer removing redundant logic that's the way the original
> is coded. You probably already realize this but thought a quick
> 'Optomization 101' wouldn't hurt....but I also don't think focusing on what
> is being optomized is away is the way you need to go on this one (which is
> the reason on my first post I questioned you "Why...").
>
> What you need to do is to simulate the post-map VHDL file and trace it back
> to why output signal 'x' at time t is set to '0' but when you use your
> original code it is '1'. Use the sim results from using your original code
> as your guide for what 'should' happen and the post-map VHDL simulation for
> what is actually happen and debug the problem.

I agree that finding out what is going on is the best
approach. Do you have any debugging tips other than comparing
the simulation results in detail and seeing what logic calculations
must be getting removed?

>
> It could be that
> - There is some bug in the translation tool
> - Could be some setting in your build process
> - Could be timing related (i.e. your testbench is violating the setup/hold
> time requirements for the post-map model)
> - Probably other things too
>
> In any case, treat the fully post-map model as something to debug and find
> out the reason for the discrepancy and go from there.

Thank you very much for your input. I really appreciate
the time you are spending to try to help me.

Best regards,
-James

KJ

unread,

Sep 12, 2006, 12:42:25 PM9/12/06

to

jame...@yahoo.ca wrote:
> >
> > > But how to indicate it
> > > is not
> > > really redundant, without using "keep" and "save" statements
> > > everywhere?
> > And this is where you start spinning your wheels (in my opinion). Instead
> > of simply debugging the post-map sim to the source of the discrepancy you're
> > trying things based on a suspicion that is not proven. Let's say for the
> > sake of argument that your suspicion is wrong about the removal of redundant
> > logic and that the problem is a timing issue with your testbench instead.
> > That would mean that every minute you spend chasing 'keep' and 'saves' etc.
> > was wasted time.
>
> I'm not really arguing for using those crutches; I'm seriously
> asking what to do so that I don't need them.

Like I said, what you have to do is debug the 'optomized post-map'
simulation model in the simulation environment to find out just exactly
when it differs from the original code and then backtrack through the
logic in the post-map design to find out why that is.

There really are no shortcuts to this process other than the things I
mentioned in earlier posts (like maybe the testbench is 'violating'
timing, use of things other than std_ulogic/std_logic, etc.).

>
> The "redundant" and "unused" logic terms I am copying from the mapper
> report and Xilinx documentation. The mapper report (see my
> first post) says "redundant" logic is being removed, not "unused
> logic".
> >From my reading of the Xilinx manuals I understand that "unused logic"
> means logic that is not connected to anything, so it can be removed
> (this latter is not what is happening to me).
> However, I haven't found anything in the manuals that explains what
> "redundant logic" is or how to write the code to avoid it.

A simple example of the 'redundant' logic that I was asking about is
something that one might decide to put in to avoid race conditions is
the following code which implements a transparent latch (By the way, do
not implement this in real code in an FPGA).
Q <= (en and D) -- #1
or (not(en) and Q) -- #2
or (en and Q); --#3

The point is that #3 is a redundant logic term and any synthesis tool
will be able to recognize this and remove it. If you remember how to
do Karnaugh maps this example is also easy enough to see it for
yourself. If you don't know about Karnaugh maps just take my word on
it that #1 and #2 are 'logically' all you need. Term #3 is something
that you would need to put into any actual implementation because using
only #1 and #2, although they are logically complete have a race
condition when 'en' is switching.

My whole reason for bringing this up was just to rule out the
possibility that this is what you meant by 'redundant'. I didn't think
it was, but just wanted to confirm. Moving on.

> I have a lot
> of identical ROMs that I use to do parallel processing; those were
> being removed in the synthesis and translate step due to not having
> clocks on them.

I don't doubt what you say but I also don't quite understand why ROMs
would be 'removed' either. Maybe all you meant is that is that you
couldn't find specific entities in the post-map VHDL that equated to
the various 'ROMs' that you instantiated in the original code....but
that's OK, a ROM is simply an array of constants, I would expect those
to get rolled right into the logic. I can see where targetting a
particular family might have to use logic blocks instead of embedded
memory to implement what your code says (but could use embedded memory
if you chose to implement a clocked ROM) but that doesn't mean that
that the original unclocked ROM is not synthesizable at all.

> So my mind is pretty much a blank as to what is
> meant by "redundant" logic, other than the common meaning that it
> is repetitive -- but it isn't really, of course, because I'm using them
> simultaneously for different data.

'Redundant' in this context generally means that the fitter found that
you have two equations that are logically equivalent. An example...

d <= a or b or c;
h <= e or f or g;
....
a <= e;
b <= f;
c <= g;

The signal 'h' is redundant since it is logically equivalent to 'd'
since, although the signals appear to be different for calculating 'h',
from a logic perspective they are identical because of the 'a<= e....'
assignments.

>
> I agree that finding out what is going on is the best
> approach. Do you have any debugging tips other than comparing
> the simulation results in detail and seeing what logic calculations
> must be getting removed?
>

None, other the ones listed below and in previous posts. Tweaking the
'no optomize' switches won't get you to the bottom of what ails your
sim. It might just postpone the inevitable when you might find that
your design doesn't work on real hardware.

If the problem is actually in your testbench in how you generate inputs
to your design (i.e. meeting the timing requirements of the post-map
design) then this should be relatively straightforward to fix. In
fact, this is a fairly common reason for why 'post' does not match
'pre' simulation results.

If nothing else it is probably much quicker to verify testbench timing
than to debug back through the post-map design....but that just means
you should look at that first. If that's not the problem then you need
to debug.

> >
> > It could be that
> > - There is some bug in the translation tool
> > - Could be some setting in your build process
> > - Could be timing related (i.e. your testbench is violating the setup/hold
> > time requirements for the post-map model)
> > - Probably other things too
> >
> > In any case, treat the fully post-map model as something to debug and find
> > out the reason for the discrepancy and go from there.
>
> Thank you very much for your input. I really appreciate
> the time you are spending to try to help me.
>

Good luck, not sure I'm helping much.

KJ

David Ashley

unread,

Sep 12, 2006, 12:47:37 PM9/12/06

to

jame...@yahoo.ca wrote:
>From my reading of the Xilinx manuals I understand that "unused logic"
> means logic that is not connected to anything, so it can be removed
> (this latter is not what is happening to me).
> However, I haven't found anything in the manuals that explains what
> "redundant logic" is or how to write the code to avoid it. I have a lot
> of identical ROMs that I use to do parallel processing; those were
> being removed in the synthesis and translate step due to not having
> clocks on them. So my mind is pretty much a blank as to what is
> meant by "redundant" logic, other than the common meaning that it
> is repetitive -- but it isn't really, of course, because I'm using them
> simultaneously for different data.

James,

Maybe there are switches to the synthesizer that would allow
turning off the optimization?

I would tend to agree that looking for bugs in the toolchain might
not be the best way to work through this.

I haven't been following this thread all along, but one thing occurs
to me. I'm new to VHDL and have settled in to an approach where
I make little incremental changes, then immediately test and verify
something didn't break. That way I can go back and the source of
the problem is obvious, because there is only a little bit of code to
examine.

In your case it's like maybe the sequence is
working, change code
working, change code
working, change code
working, change code
broken, change code <-- it broke here but you didn't discover it
broken, change code
broken, change code
broken, change code
broken <--- you're here

It's just a theory. But I've seen this sort of thing before. The
most recent change didn't cause the problem and in fact
couldn't have caused the problem, but it's not working.
Therefore the tools must be broken. Really the problem
occured earlier...

Sorry to intrude...

-Dave

jame...@yahoo.ca

unread,

Sep 12, 2006, 10:55:45 PM9/12/06

to

Here's a handy link to this whole thread, provided by Google:
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

KJ wrote:

Yes, I'm familiar with Karnaugh maps and I understand the point.
Remember, I am past synthesis and my problem is in the mapper,
going from .NGD (Native Generic Database) to .NCD (Native Circuit
Description) files. Does this redundant logic removal process you
just described happen at this stage? Remember, the "Redundant"
terminology is Xilinx's, not mine, and it is being invoked by the
mapper. I am just wondering what Xilinx means by "Redundant
Blocks" (sic) of logic. This terminology can be seen in the section
from the mapper that I included with my first post.

>
> > I have a lot
> > of identical ROMs that I use to do parallel processing; those were
> > being removed in the synthesis and translate step due to not having
> > clocks on them.
> I don't doubt what you say but I also don't quite understand why ROMs
> would be 'removed' either. Maybe all you meant is that is that you
> couldn't find specific entities in the post-map VHDL that equated to
> the various 'ROMs' that you instantiated in the original code....but
> that's OK, a ROM is simply an array of constants, I would expect those
> to get rolled right into the logic. I can see where targetting a
> particular family might have to use logic blocks instead of embedded
> memory to implement what your code says (but could use embedded memory
> if you chose to implement a clocked ROM) but that doesn't mean that
> that the original unclocked ROM is not synthesizable at all.

The explanation I received was that without a clock, they
were being interpreted as asynchronous RAMs and were
optimized away. Further explanation was not given to me.
That was happening at the Translate step,
which was the previous step to the mapping step, and is fixed.

>
> > So my mind is pretty much a blank as to what is
> > meant by "redundant" logic, other than the common meaning that it
> > is repetitive -- but it isn't really, of course, because I'm using them
> > simultaneously for different data.
> 'Redundant' in this context generally means that the fitter found that
> you have two equations that are logically equivalent. An example...
>
> d <= a or b or c;
> h <= e or f or g;
> ....
> a <= e;
> b <= f;
> c <= g;
>
> The signal 'h' is redundant since it is logically equivalent to 'd'
> since, although the signals appear to be different for calculating 'h',
> from a logic perspective they are identical because of the 'a<= e....'
> assignments.

Does the mapper really do this? Is this what Xilinx means by
"Redundant Blocks" of logic at the mapping stage?

Thanks again,

Best regards,
-James

jame...@yahoo.ca

unread,

Sep 12, 2006, 11:03:05 PM9/12/06

to

Here's a handy link to this whole thread, provided by Google:
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

David Ashley wrote:
> James,
>
> Maybe there are switches to the synthesizer that would allow
> turning off the optimization?

Yes, but absolutely nothing is for turning off optimization of
"Redundant Blocks" (sic)* of logic; everything is for turning off
removal of "Unused" logic. The mapper -u option, the "keep" constraint
and the "save" constraint, are all for preventing removal of "Unused"
logic, not "Redundant Blocks" (sic)* of logic. It's enough to make me
tear my hair out. Anyway, as you can read from the other posts,
doing that is a kludge and at best a debugging step to identify
the problem area, not the real way I want to solve the problem.

*See mapper report in my first post in this thread.

> I would tend to agree that looking for bugs in the toolchain might
> not be the best way to work through this.
>
> I haven't been following this thread all along, but one thing occurs
> to me. I'm new to VHDL and have settled in to an approach where
> I make little incremental changes, then immediately test and verify
> something didn't break. That way I can go back and the source of
> the problem is obvious, because there is only a little bit of code to
> examine.
>
> In your case it's like maybe the sequence is
> working, change code
> working, change code
> working, change code
> working, change code
> broken, change code <-- it broke here but you didn't discover it
> broken, change code
> broken, change code
> broken, change code
> broken <--- you're here
>
> It's just a theory. But I've seen this sort of thing before. The
> most recent change didn't cause the problem and in fact
> couldn't have caused the problem, but it's not working.
> Therefore the tools must be broken. Really the problem
> occured earlier...

I think I may very well have to try that, building up my
project piece by piece.

>
> Sorry to intrude...
> -Dave

Not at all. I'm grateful for your input.

Best regards,
-James

KJ

unread,

Sep 13, 2006, 5:56:02 AM9/13/06

to

<jame...@yahoo.ca> wrote in message
news:1158116145.0...@h48g2000cwc.googlegroups.com...

>>
>> > I have a lot
>> > of identical ROMs that I use to do parallel processing; those were
>> > being removed in the synthesis and translate step due to not having
>> > clocks on them.
>> I don't doubt what you say but I also don't quite understand why ROMs
>> would be 'removed' either. Maybe all you meant is that is that you
>> couldn't find specific entities in the post-map VHDL that equated to
>> the various 'ROMs' that you instantiated in the original code....but
>> that's OK, a ROM is simply an array of constants, I would expect those
>> to get rolled right into the logic. I can see where targetting a
>> particular family might have to use logic blocks instead of embedded
>> memory to implement what your code says (but could use embedded memory
>> if you chose to implement a clocked ROM) but that doesn't mean that
>> that the original unclocked ROM is not synthesizable at all.
>
> The explanation I received was that without a clock, they
> were being interpreted as asynchronous RAMs and were
> optimized away.

Well whatever is 'optomizing' them away has a bug in it then if the output
is now 'different' because of that optomization. Like I said, an asynch ROM
is simply a table of constants. Synthesis tools are very good at optomizing
constants (as they should be). It wouldn't surprise me at all that...
- You wouldn't be able to 'find' the ROM after mapping to a particular part
because the result of those constants has been integrated into whatever
downstream logic that the ROM was feeding.
- That the implementation might (probably) use more logic resources and none
of the internal memory if the targetted part requires a clock in order to be
able to map it into one of those internal memories.

In any case, the overall function has not changed it should simulate the
same. If not, then a simple test case and a service request to Xilinx might
be in order.

> Further explanation was not given to me.
> That was happening at the Translate step,
> which was the previous step to the mapping step, and is fixed.

Not sure I would call it 'fixed' (unless what was 'broken' was just the
ability to use internal memory which as mentioned above is not really a
functional issue but one of trying to properly use internal resources to
implement a given function). Any way, moving on.

>>
>> 'Redundant' in this context generally means that the fitter found that
>> you have two equations that are logically equivalent. An example...
>>
>> d <= a or b or c;
>> h <= e or f or g;
>> ....
>> a <= e;
>> b <= f;
>> c <= g;
>>
>> The signal 'h' is redundant since it is logically equivalent to 'd'
>> since, although the signals appear to be different for calculating 'h',
>> from a logic perspective they are identical because of the 'a<= e....'
>> assignments.
>
> Does the mapper really do this?

Yes as it should. Remember, 'logic' does care about propogation delays and
from the standpoint of transforming the source code into an implementation
these things can legally be combined as redundant since they (in this case
'd' and 'h') perform exactly the same function. You wouldn't be able to
tell from the outside which is 'd' and which is 'h' by wiggling the inputs
'e', 'f' or 'g'. Another simple example is
x <= not(y0);
y0 <= not(y1);
y1 <= not(y2)
y2 <= not(y);

which is equivalent to x <= not(not(not(not(y))));
which is equivalent to x<= y;
which when implemented in an FPGA would not even use a single logic
resource. Whatever logic in the original source that needed 'x' or 'y'
would get the same signal

> Is this what Xilinx means by
> "Redundant Blocks" of logic at the mapping stage?

I believe so, but haven't had the need to dig any deeper.

Brian Drummond

unread,

Sep 13, 2006, 7:54:49 AM9/13/06

to

On 12 Sep 2006 08:34:10 -0700, jame...@yahoo.ca wrote:

>> What you need to do is to simulate the post-map VHDL file and trace it back
>> to why output signal 'x' at time t is set to '0' but when you use your
>> original code it is '1'. Use the sim results from using your original code
>> as your guide for what 'should' happen and the post-map VHDL simulation for
>> what is actually happen and debug the problem.
>
>I agree that finding out what is going on is the best
>approach. Do you have any debugging tips other than comparing
>the simulation results in detail and seeing what logic calculations
>must be getting removed?

One tip: instantiate both behavioural and post-map modules in your
testbench, and run them in parallel. You can assert on differences in
the outputs, and trace internal signals in the wave window (to the
extent that you can still recognize internal signals).

Possibly also set breakpoints on differences in internal signals which
ought to be the same.

- Brian

jame...@yahoo.ca

unread,

Sep 14, 2006, 10:06:56 AM9/14/06

to

Handy link for this entire thread:
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

Thank you for your advice! I'll try to keep this thread
posted if and when I find answers.

Best regards,
-James

jame...@yahoo.ca

unread,

Sep 24, 2006, 7:35:57 PM9/24/06

to

Handy link for this entire thread:
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

Xilinx tech. support said to separately register each level of logic,
since I have some lines of up to four xor statements being assigned to
a signal. I tried that, but it didn't help. However, the sub-module in
which the mapper is connecting two of my output registers together,
works on its own in in a separate project in post-map simulation when
those output registers are treated as port signals. It works on its on
without or without the added registers that do one xor at a time, but
still cross-connects with or without the added registers when used as a
submodule of user_logic.

Clearly I am dealing with undocumented features of the mapper; certain
coding techniques are required in order for it to accomplish my intent.
Xilinx really should be documenting these requirements; it's not fair
to tell people that "the problem is with the way you write your VHDL"
otherwise. Documentation for synthesis and translate is much better.

-James

jame...@yahoo.ca

unread,

Sep 25, 2006, 1:06:51 AM9/25/06

to

Handy link for this entire thread:
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

I tried adding a separate level of registering in my main
line VHDL code and was trying to test it when the ModelSim
simulator died. No clock; therefore no signal processing. The
transcript
(output window) looks normal and ends up with:

# ** Failure: Simulation successful (not a failure). No problems
detected.
# Time: 1320 ns Iteration: 0 Process: /user_logic_tb/line__94
File: user_logic_tb.vhw
# Break at user_logic_tb.vhw line 273
# Simulation Breakpoint: Break at user_logic_tb.vhw line 273
# MACRO ./user_logic_tb.fdo PAUSED at line 16

Both post-Map and behavioral simulation show no clock and no
signal processing; all flat lines all of a sudden.
I'm looking at reinstalling. I'm using the ModelSim XE III 6.1e starter
edition. Does anyone know how to fix this without reinstalling?

Also in regards to my previous message:

Xilinx tech. support said to separately register each level of logic,
since I have some lines of up to four xor statements being assigned to

a signal. I tried that, but it didn't help. ...but

still cross-connects with or without the added registers when used as a
submodule of user_logic.

Would anyone have some suggestions about how to write the VHDL
so it won't do that?

Thanks in advance,
-James

KJ

unread,

Sep 25, 2006, 7:00:24 AM9/25/06

to

<jame...@yahoo.ca> wrote in message
news:1159160810....@i3g2000cwc.googlegroups.com...

> Handy link for this entire thread:
> http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6
>
> I tried adding a separate level of registering in my main
> line VHDL code and was trying to test it when the ModelSim
> simulator died. No clock; therefore no signal processing. The
> transcript
> (output window) looks normal and ends up with:
>
> # ** Failure: Simulation successful (not a failure). No problems
> detected.
> # Time: 1320 ns Iteration: 0 Process: /user_logic_tb/line__94
> File: user_logic_tb.vhw
> # Break at user_logic_tb.vhw line 273
> # Simulation Breakpoint: Break at user_logic_tb.vhw line 273
> # MACRO ./user_logic_tb.fdo PAUSED at line 16
>
> Both post-Map and behavioral simulation show no clock and no
> signal processing; all flat lines all of a sudden.
> I'm looking at reinstalling. I'm using the ModelSim XE III 6.1e starter
> edition. Does anyone know how to fix this without reinstalling?

While it's not impossible that your Modelsim install got corrupted, I highly
doubt it and therefore suggest that reinstalling is likely going to be
wasted time. I've yet to 'fix' anything by re-installing Modelsim. I'd
suggest debugging as to why the clock signal is not running any more.

>
> Also in regards to my previous message:
> Xilinx tech. support said to separately register each level of logic,
> since I have some lines of up to four xor statements being assigned to
> a signal. I tried that, but it didn't help. ...but
> still cross-connects with or without the added registers when used as a
> submodule of user_logic.

It didn't work because that was just a random guess on Xilinx tech supports
part to try to close the service request. Since the problem of why pre and
post VHDL models are acting differently has absolutely nothing to do with
your source code is there suggestion has 0% chance of solving the
problem....which you confirmed.

I don't know how you actually posed the question to Xilinx but the question
that should have been posed to them is along the lines of: "I have a
pre-route VHDL simulation design file and a post-route/map/whatever VHDL
simulation file that is the output of ISE version X.X. Given the same
input, they don't simulate the same. Signal 'X' at time 't' is a '1' using
the original design file and it is '0' using the VHDL output from ISE. I've
attached the original source VHDL files, the ISE project files which
includes the post-map VHDL as well as the testbench VHDL that generates the
stimulus and a Modelsim '.do' file which runs each design up until time 't'
where you can see that the signal 'X' coming out is different between the
two models. I've confirmed that my testbench generates input stimulus to
both designs that meets the input setup/hold time requirements of the final
routed design. My question is 'Why are the outputs different?"

Is that anything close to how you worded it?

When posed in that manner, any answer/suggestion from Xilinx that does not
address the question of "Why are the outputs from the two simulations
different?" is irrelevant. Letting them off the hook with the suggestion of
changing your source code to add registers because you have "some lines of
up to four xor statements being assigned to a signal" (whatever that really
means) is just trying to make you go away without addressing your real
problem....but if your service request did not ask that basic question and
provide them with the two simulation models that demonstrate this difference
in the first place, well, they can only deal with what you provide them.

>
> Would anyone have some suggestions about how to write the VHDL
> so it won't do that?
>

Yes...as pointed out earlier in the thread...

1. Write (if you don't have one already) a testbench that instantiates the
original design file. Make sure all input setup and hold times in the
testbench meet the timing requirements listed by ISE in the timing analysis
report.
2. Run the testbench with both the original design file and the post-map
file and document where the two predict different results.
3a. Open a service request to Xilinx sending them this information and ask
the question as I mentioned in the previous paragraph.
3b. Debug into the post-map design file and see if you can determine the
cause for the difference while Xilinx is also chewing on it.

Always keep in mind that the 'pre' and 'post' simulation models are ALWAYS
supposed to produce the same result given the same stimulus that meets all
input timing requirements and that this is ALWAYS TRUE NO MATTER WHAT the
original source code is. When this is not the case (and it does happen), as
I mentioned earlier in this thread the root cause of the discrepancy is
generally...

1. Testbench not meeting input setup/hold time requirements (i.e. you need
to fix your testbench).
2. Improper use of types other than 'std_logic/std_ulogic'. (From earlier
in the thread I thought you said there were none in your code. But if there
were than again, this would be yours to fix).
3. Bug in the tool, in this case ISE. In this case, you need to open a
service request and have them explain to you why 'pre' and 'post' simulation
is producing different results....and not let them off with anything that
causes you to change your code except to fix something along the lines of #1
or #2 that you missed. Changing the design because "some lines of up to
four xor statements being assigned to a signal" is not an acceptable
reason...see the previous paragraph with the 'ALWAYS' in it for
justification.

KJ

jame...@yahoo.ca

unread,

Sep 26, 2006, 12:42:41 AM9/26/06

to

Handy link for this entire thread:
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

Hi KJ,

Thanks for your information and moral support.

About ModelSim not seeming to work, I think I was just tired. I wasn't
expanding the waveform window pane to look at the whole view. Without
doing that, I was just looking at the initial "offset" time of 100 ns
in which nothing was happening. It's also possible I was tired and not
remembering that I have to wait for the simulation to complete; I
recall seeing the final output was 'U' - Uninitialized, and that may be
the reason. Next time I should record screenshots so that I can prove I
was not hallucinating, or, alternatively, while I'm at it, give me time
to realize what is going on.

"input setup/hold time" is the time required before a clock edge to
setup and hold an input signal so that the receiving FF will
successfully register it, is that right?

I am using the "Test Bench WaveForm" GUI feature (Xilinx ISE ver
8.2.03i, now) and in the clock timing input window that comes up
automatically when I create a tbw file, there are settings for "Input
setup time" and "output valid delay", that I have to fill in. "Input
setup time" is the time duration that the testbench will place my input
signal transitions before its clock edges, is that right? For example,
I have a "load" pulse that I draw in the GUI and the testbench will
make sure to activate its edges 1ns before the matching clock edges if
I set "input setup time" to 1ns, is that right? Does "output valid
delay" mean the time duration after a clock edge at which output data
becomes valid? If so, I don't understand what that tells the testbench
to do. Since that seems to be dependent upon the device under test and
yet it is a testbench entry parameter, I am confused as to the meaning
of that and must be understanding it wrong.

So I have to make sure that my "Input setup time" that I specify has
got to satisfy all input setup/hold times required by the mapper timing
report, is that right? What do I look at to specify the "output valid
delay" (it must be some kind of data in the timing report. I'll take a
look)? Does that mean I'm telling the testbench not to "look" at data
before that time period after a clock pulse just for reporting and
display purposes?

I am synthesizing and translating and getting correct operation in
simulation but then mapping and getting incorrect operation -- I have
an initial load into a submodule of 128 bits and a simple subsequent
output from the submodule of that separated into its four 32-bit words.
In that, post-map, the lower three bytes of the second register are
being duplicated into the third register, both in the signals used in
the calling module and in the signals used in the submodule. It looks
like logic is getting optimized away incorrectly, but could that really
be happening by violating setup and hold times in the testbench?

I am getting seven warnings of the following type from ModelSim when it
does post-map simulation:
# ** Warning: /X_SFF SETUP High VIOLATION ON CE WITH RESPECT TO CLK;
# Expected := 0.74 ns; Observed := 0.583 ns; At : 291.975 ns
# Time: 291975 ps Iteration: 3 Instance:
/user_logic_tb/uut/slv_reg0_7
Is that just a slight difference as compared to the behavioral
simulation? I think that the simulator compares results, is that right?

To clarify what was meant by levels of logic, here is what the Xilinx
tech support engineer wrote:

-----------------------------
I looked at your code and I have some suggestions for you. You have
registered your design, but you haven't pipelined the design which will
more or less fix the issue you are having. What you did was place a
register on the output, but the output is not being optimized, the
combinational logic in-between is. This is what needs to be
registered. In one example of your code you have 4 logic levels.

w2(31 downto 24) <= wsav0(31 downto 24) xor wsav2(31 downto 24) xor
wsav1(31 downto 24) xor subwordsav(31 downto 24) xor rconsav;

In a fully pipelined design, you would only have one logic level per
register. Meaning you will xor two signals, register it and then xor
it with the next, register it and so on. This should fix your problem.
Map is optimizing all your combinational logic in look-up tables (LUTs)
and its removing the duplicate XOR gates. Placing the registers
between the logic levels will stop that.

In regards to your other questions, due to the tools behaving correctly
[THAT HAS NOT BEEN ESTABLISHED], this has become a design issue and no
longer a support issue. Unfortunately, the technical support team
doesn't have the resources to help you further with your case. We do
have other options available though. If you go to the Design and
Education Services page (linked below) there are a couple of options in
which you can get help.

http://www.xilinx.com/support/gsd/

We have the Titanium Engineering group as well as Xilinx Design
Services. This is in addition to possible help from the local FAE.
I'm sorry I am no longer able to help you with your issues, but I hope
my information has pointed you in the right direction toward solving
your problem.

I will go ahead and close this case; if you have any other support
issues please open another Webcase and our team will be happy to assist
you.
-----------------------------

I did the above suggestion and my job of combining only pairs of
signals at a time at alternate clock edges was a work of art, but it
produced the exact same results.

I have produced screenshots of the simulations that show correct
operation in behavioral and the flaw in post-map and sent that to
Xilinx. The way I worded it was to ask for the correct VHDL coding
style to prevent that kind of optimization and I complained about
undocumented mapper coding requirements. So now the problem doesn't
look like VHDL coding style any more.

A while ago I removed my one boolean signal and replaced it with
std_logic that I can test in "if" statements just as well, so I haven't
had anything but std_logic in my design during the last couple of weeks
or so.

I'm getting back to Xilinx and I'll try your approach of getting them
to explain what is going on, use your wording, and try not to accept
anything less than working with me until the issue is solved.

What is that post-map design file that I can look at? I do have six
unconnected signals being reported as nets that have no load in the map
report, but the FAE (Field Application Engineer) says he never has any
problems with that type of thing. However, he agrees that it wouldn't
hurt to look at the "Technology Schematic", produced post-translate,
identify those nets that map is complaining about and see if they are
actually intended to have loads (I thought everything in my design is
connected up). That's what I'm going to do next, now that I've removed
a bunch of VHDL code that doesn't change anything and just adds
complexity.

Best regards,
-James

jame...@yahoo.ca

unread,

Sep 26, 2006, 12:48:46 AM9/26/06

to

Handy link for this entire thread:
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

Here are the ISE project settings that I changed from the defaults:

Synthesize properties: "Keep Hierarchy" on. (Left "Equivalent Register
Removal" on).
Translate properties: "Preserve Hierarchy on Sub Module" on.
Mapper: "Trim unconnected signals" off
(left "Allow logic opt. across hierarchy" off).
Opt. strategy: speed
Generate detailed Map report
By default, equivalent register removal is on, so I had to turn on
global
optimization mode (-global_opt on) in order to turn it off.
Trim unconnected signals (-u) - I had to turn it on, according to the
mapper
output, as long as I use global optimization mode.
(-u is known as "(Do Not Remove Unused Logic)" on pg 141 of dev.pdf
ver 8.2i.)

-r map option (no register ordering)

jame...@yahoo.ca

unread,

Sep 26, 2006, 12:54:16 AM9/26/06

to

Handy link for this entire thread:
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

Here are the definitions:
>From the installed help files at
C:\Xilinx\doc\usenglish\help\iseguide\mergedProjects\xsim\html\xs_hidd_initialize_timing_dialog.htm

The Input Setup Time is the minimum amount of time between the arrival
of an input pulse and a clock pulse for that input pulse to be
considered valid.

The Output Valid Delay is the maximum amount of time allowed for an
output to change states for it to be considered valid when used in a
self-checking test bench. The Test Bench Waveform Editor can write out
a self-checking test bench. For more information see Generating
Expected Simulation Results. Time units are determined by the Time
Scale drop-down menu at the lower right.

jame...@yahoo.ca

unread,

Sep 26, 2006, 2:00:05 AM9/26/06

to

jame...@yahoo.ca wrote:
Handy link for this entire thread:
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

Mapper warnings are:
WARNING:LIT:243 - Logical network u0/N0 has no load.
WARNING:LIT:243 - Logical network u0/N1 has no load.
WARNING:LIT:243 - Logical network u0/r0/N01 has no load.
WARNING:LIT:243 - Logical network u0/r0/N11 has no load.
WARNING:LIT:243 - Logical network Bus2IP_Clk_BUFGP/N2 has no load.
WARNING:LIT:243 - Logical network Bus2IP_Clk_BUFGP/N3 has no load.

Can't find these in the technology schematic (post-translate). There is
a u0/r0/N0 and u0/r0/N1 and a u0/r0/N21 and u0/r0/N31 in the Technology
schematic. Many u0/Ns followed by three digit numbers. Nothing named
"N" anything in the RTL schematic. Saw Bus2IP_Clk_BUFGP, but no /N2 or
/N3 (nothing after Bus2IP_Clk_BUFGP). I could try a new project before
doing map and see if map is going back and removing those.

Did that. Made a new project and did only up to synthesize and
translate. There is N0 and N1 in the top level in the Technology
schematic. There is a u0/r0/N0 and u0/r0/N1 and u0/r0/N31. Saw
Bus2IP_Clk_BUFGP, but no /N2 or /N3 (nothing after Bus2IP_Clk_BUFGP).
Plenty of N's at the top level. Map *does* seem to be going back and
adding to or changing the Technology schematic, but not taking away the
above, which never seemed to be there. Not good.

KJ

unread,

Sep 26, 2006, 6:44:04 AM9/26/06

to

<jame...@yahoo.ca> wrote in message
news:1159245761....@k70g2000cwa.googlegroups.com...

> Handy link for this entire thread:
> http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6
>
>

> "input setup/hold time" is the time required before a clock edge to
> setup and hold an input signal so that the receiving FF will
> successfully register it, is that right?

Yes, almost. Setup time is the time for the signal to be stable prior to
the clock. Hold time is the time for the signal to be stable after the
clock (many times, but not always hold time is 0).

>
> I am using the "Test Bench WaveForm" GUI feature (Xilinx ISE ver
> 8.2.03i, now) and in the clock timing input window that comes up
> automatically when I create a tbw file, there are settings for "Input
> setup time" and "output valid delay", that I have to fill in. "Input
> setup time" is the time duration that the testbench will place my input
> signal transitions before its clock edges, is that right?

Yes

> For example,
> I have a "load" pulse that I draw in the GUI and the testbench will
> make sure to activate its edges 1ns before the matching clock edges if
> I set "input setup time" to 1ns, is that right?

Yes

> Does "output valid
> delay" mean the time duration after a clock edge at which output data
> becomes valid? If so, I don't understand what that tells the testbench
> to do. Since that seems to be dependent upon the device under test and
> yet it is a testbench entry parameter, I am confused as to the meaning
> of that and must be understanding it wrong.

Sort of. What you're telling the testbench is how long it is before the
outputs of your design will become valid so that it doesn't bother to check
them prior to that time. You're right that this is dependent on the device
under test but so is the input setup times that you're entering. Generally
what you would put into the testbench for these are numbers that you 'can
live with' in your actual design. By that I mean, the FPGA doesn't usually
live in isolation it is connected to outside devices that may have timing
requirements as well.

For example, if one of the inputs to the FPGA is connected to some device
that has Tco = 10 ns and that part and the FPGA both transmit/receive this
signal with the same 10MHz clock (i.e.
T=100 ns). Then as a first order approximation one could calculate the
input setup time requirement at the FPGA as being T-Tco = 90 ns. A more
accurate
approximation would be to realize that there is going to be clock skew
between the two devices and delay on the printed circuit board that will eat
into that timing margin and you should take those into consideration as you
define what the FPGA input setup requirements are. The point is you
probably wouldn't want to put the
90ns in as the FPGA's requirement, get some estimates for those other two or
swag them as being less than 5 ns or so and enter 85 ns. For a really
high speed design you'll be more careful about determining the things that
go into the timing budget for the simple reason that there is less time per
clock cycle and you can't afford to not have better control over everything.

Anyway, whatever that input setup time requirement is that you determine
should go two places: as a constraint to the fitter that it needs to meet
and into your testbench. Also note that the requirement is determined
without regard to any timing numbers from the FPGA itself. It is a
requirement that is determined by the outside world around the FPGA. Most
people do go through some form of this figuring out and enter the number
into the fitter so that when the timing analysis is performed it clearly
flags violations. What most do not do is to enter that exact same
requirement into the testbench.

Repeat this process for computing what the clock to output delay requirement
of the FPGA is. Here you would start with your clock period and subtract
off the input setup time of whatever device is on the receiving end of the
FPGA output, subtract off clock skew, subtract off PCB delay. The basic
process is the same.

>
> So I have to make sure that my "Input setup time" that I specify has
> got to satisfy all input setup/hold times required by the mapper timing
> report, is that right? What do I look at to specify the "output valid
> delay" (it must be some kind of data in the timing report. I'll take a
> look)? Does that mean I'm telling the testbench not to "look" at data
> before that time period after a clock pulse just for reporting and
> display purposes?

See above for the more detailed answer.

>
> I am synthesizing and translating and getting correct operation in
> simulation but then mapping and getting incorrect operation -- I have
> an initial load into a submodule of 128 bits and a simple subsequent
> output from the submodule of that separated into its four 32-bit words.
> In that, post-map, the lower three bytes of the second register are
> being duplicated into the third register, both in the signals used in
> the calling module and in the signals used in the submodule. It looks
> like logic is getting optimized away incorrectly, but could that really
> be happening by violating setup and hold times in the testbench?
>

Don't know. Just as with real life, when you violate timing requirements
pretty much anything might happen. It would depend totally on the actual
implementation. If that's what it's doing, then that's what it's doing.

> I am getting seven warnings of the following type from ModelSim when it
> does post-map simulation:
> # ** Warning: /X_SFF SETUP High VIOLATION ON CE WITH RESPECT TO CLK;
> # Expected := 0.74 ns; Observed := 0.583 ns; At : 291.975 ns
> # Time: 291975 ps Iteration: 3 Instance:
> /user_logic_tb/uut/slv_reg0_7
> Is that just a slight difference as compared to the behavioral
> simulation? I think that the simulator compares results, is that right?

The simulator calls it a warning but it really is a design error. What
exactly the simulation model does when this occurs is a function of how the
simulation model is coded. Best to investigate why the logic path is long
and clean up those problems now. If all of the timing violations are setup
times, you can temporarily code around this simply by slowing down your
clock for simulation. Get the testbench to the point that it can run the
original source and the post-map model with no reported timing errors.

The more thorough check though is to compare the timing requirements that
you have in your testbench with what pops out of the timing analysis report.
The reason it is more thorough is that your simulation might not happen to
hit every input under just the right conditions. Bottom line is that if
you've determined that the input setup requirement is 15 ns then the timing
report had better report something less than 15 ns in the actual
implementation or you have a design issue to resolve. Timing analysis does
not require any simulation. Do this for all inputs and outputs.

The problem as I see it is that the original source and the post-map
simulation models don't agree. I don't see anything in the response from
Xilinx to address this but maybe you didn't word it in exactly that manner.
Once you get your timing errors cleaned up you should have a testbench that
will provide the identical stimulus to both models. If they still perform
differently then open another service request to Xilinx (or your FAE) and
have them explain why the two are different. It has absolutely nothing to
do with optomizations. Original source and post-anything must be
functionally identical...regardless of what the function actually is. Stick
to your guns on that and don't let them off the hook but also don't
sidetrack them with optomization settings or any of that. It is the
software's job to translate your code into something functionally identical
that they can actually implement inside a real device.

Just make sure you've cleaned up all timing problems before since that is on
your side, not theirs.

>
> I did the above suggestion and my job of combining only pairs of
> signals at a time at alternate clock edges was a work of art, but it
> produced the exact same results.
>
> I have produced screenshots of the simulations that show correct
> operation in behavioral and the flaw in post-map and sent that to
> Xilinx. The way I worded it was to ask for the correct VHDL coding
> style to prevent that kind of optimization and I complained about
> undocumented mapper coding requirements. So now the problem doesn't
> look like VHDL coding style any more.

Other than things I've mentioned earlier in the thread about incorrectly
using things other than std_logic/std_ulogic or latches or things like that
it never is coding 'style' per se. So do the timing analysis, verify that
the testbench presents inputs and checks outputs at the appropriate time per
your requirements then re-collect the screen shots and send it off and ask
them to explain the difference since they are supposed to be functionally
identical.

> A while ago I removed my one boolean signal and replaced it with
> std_logic that I can test in "if" statements just as well, so I haven't
> had anything but std_logic in my design during the last couple of weeks
> or so.

Good, one less thing to worry about right now. The reason std_logic is such
a good thing is simply because of the value 'X'. Being able to get rid of
all the unknowns in a simulation is a milestone of sorts and when you can't
for whatever reason then that big 'X' is staring at you pointing you right
to the place to investigate. Anyway, you don't have that but just thought
I'd toss out why using only std_logic is a 'good' thing until you've been
doing this for a while and can move on with confidence to other types.

>
> I'm getting back to Xilinx and I'll try your approach of getting them
> to explain what is going on, use your wording, and try not to accept
> anything less than working with me until the issue is solved.
>

Good plan.

> What is that post-map design file that I can look at? I do have six
> unconnected signals being reported as nets that have no load in the map
> report, but the FAE (Field Application Engineer) says he never has any
> problems with that type of thing. However, he agrees that it wouldn't
> hurt to look at the "Technology Schematic", produced post-translate,
> identify those nets that map is complaining about and see if they are
> actually intended to have loads (I thought everything in my design is
> connected up). That's what I'm going to do next, now that I've removed
> a bunch of VHDL code that doesn't change anything and just adds
> complexity.

It shouldn't be a problem and if it is it's Xilinx's problem anyway.
Remember your original source code is not the actual implementation it is an
abstraction. FPGAs do not implement 'a xor b' they translate that into a
lookup table that has been programmed appropriately and enable pass
transistors to get signals 'a' and 'b' to the proper inputs to that table.
Sometimes warnings can point to problems but in this case I don't think it
will but like the FAE said, it wouldn't hurt to look either. I'm not
exactly sure at what file you need to look at though. Maybe try searching
through all of the files for the net name that is being flagged and see what
hits.

KJ

jame...@yahoo.ca

unread,

Sep 26, 2006, 7:50:32 PM9/26/06

to

Link to entire thread
http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6

Thanks for that, KJ. Meditating on timing issues, and looking at the
mapper timing report resulted in me setting the input setup times and
output valid delays in the testbench to exceed the maximum figures of
"Setup to clk (edge)" and "clk (edge) to pad" that were reported. I
took the additional conservative step of halving the clock frequency.
Now my post-map sim. is giving me correct results. It is odd that
improper timing would cause byte mixup like that, but I can certainly
contemplate types of interconnections that might behave that way. The
only other thing that I think helped was making my design synchronous
by registering all data in clocked registers. One thing I clued into is
that you can't, at least in Xilinx, set a signal in two or more
different processes, because that results in multiple sourcing and
unknown ('X') output. Also, if something like the following is written,
which is the correct way to write a register:

MY_PROC : process (clk, rst) is
begin
if (rst = '0') then
a <= '1';
elsif (clk'event and (clk = '1')) then
if (b = '0') then
a <= '1';
else
a <= '0';
end if;
end if;
end process MY_PROC;

Whatever is put in for "rst" will be the reset, and if you try to put
other signals, that you use for something else elsewhere, in the place
of "rst" in the above, Xilinx will connect them to your input port's
reset signal and create all sorts of mess.

Best regards,
-James

Martin Thompson

unread,

Sep 27, 2006, 4:09:15 AM9/27/06

to

Hi James,

jame...@yahoo.ca writes:
<snip>

> The only other thing that I think helped was making my design
> synchronous by registering all data in clocked registers.

That's always a good idea.

> One thing I clued into is that you can't, at least in Xilinx, set a
> signal in two or more different processes, because that results in
> multiple sourcing and unknown ('X') output.

You can't do that physically on-chip for most (all?) "modern" FPGAs.
If two drivers drives different values onto one wire, they will
conflict. 'X' is the simulator's way of telling you that.

However, if all but one of the processes is driving a 'Z', most
synthesisers will create you a bunch of multiplexers such that the one
process that is driving a non-Z value at any time will "win" and the
signal will then take on that value within the FPGA, just like in
simulation.

I usually try and avoid doing this as you can end up with lots of
extra logic that you didn't expect, and debugging what's going on on
chip if more than one process does drive a non-Z value can be a bit of
a pain!

>Also, if something like the following is written,
> which is the correct way to write a register:
>
> MY_PROC : process (clk, rst) is
> begin
> if (rst = '0') then
> a <= '1';
> elsif (clk'event and (clk = '1')) then
> if (b = '0') then
> a <= '1';
> else
> a <= '0';
> end if;
> end if;
> end process MY_PROC;
>
> Whatever is put in for "rst" will be the reset, and if you try to put
> other signals, that you use for something else elsewhere, in the place
> of "rst" in the above, Xilinx will connect them to your input port's
> reset signal and create all sorts of mess.
>

I'm not sure what you mean by this, if you ask for other signals to be
connected to the rst input of the flipflop, then the tools will surely
do as you ask. The fact that this reset is asynchronous may then
cause you grief. If you want to reset a register during "runtime"
rather than just as an initalisation stage, you're much better off
using a synchronous reset:

MY_PROC : process (clk, arst) is
begin
if (arst = '0') then -- async reset

a <= '1';
elsif (clk'event and (clk = '1')) then

if srst = '1' then -- sync reset
a <= '1';
else
-- the rest of the code

end if;
end process MY_PROC;

If you look back over this groups discussions, you'll find lots more
details on the potential pitfalls of asynchronous resets as well...

Cheers,
Martin

--
martin.j...@trw.com
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
http://www.conekt.net/electronics.html

KJ

unread,

Sep 27, 2006, 6:46:07 AM9/27/06

to

<jame...@yahoo.ca> wrote in message
news:1159314632.7...@i42g2000cwa.googlegroups.com...

> Link to entire thread
> http://groups.google.com/group/comp.arch.fpga/browse_thread/thread/6d594b2ab04beb4b/e39055a323c18cd6#e39055a323c18cd6
>

> Now my post-map sim. is giving me correct results.

Glad to hear that it's working now.

> It is odd that
> improper timing would cause byte mixup like that, but I can certainly
> contemplate types of interconnections that might behave that way.

As with timing problems on a real board, the results you see with a timing
problem using post-tool models are usually 'odd'. It's completely
deterministic (unlike a real board) but it seems odd because of the mapping
from your source into an actual implementation. That mapping, while
producing something that is functionally identical, is generally not what
you would expect. But it works and that mapping is what the tool is
supposed to be good at doing so don't lose sleep over it.

> The
> only other thing that I think helped was making my design synchronous
> by registering all data in clocked registers. One thing I clued into is
> that you can't, at least in Xilinx, set a signal in two or more
> different processes, because that results in multiple sourcing and
> unknown ('X') output.

You can't in any logic (not just Xilinx) have more than one process driving
a signal. Just like on a board two outputs driving the same signal which
can't be done unless the design is such that all but one process is setting
the signal to 'Z'. Obviously this can be useful for data busses (which
would explicitly set the output to 'Z' except when being read from) but for
most other signals this is not the case.

One trick to catching this bug earlier (instead of having to debug to find
that the reason for the 'X' is two processes driving) is to use the type
std_ulogic (and std_ulogic_vector) instead of std_logic and std_ulogic for
all signals except for those that truly do require multiple drivers. What
you'll find if you make this conversion is that the compiler will flag this
as an error for you right up front even before you get into simulation
(assuming that these two processes are in the same entity and physically in
the same file). If the two processes are in totally different entites and
are in separate source files the compiler won't complain when compiling
either file but the moment you invoke the simulator it will complain about
net 'xyz' being driven in more than one place and will generally point you
to the two places....one of which must be wrong. Much easier to fix that
way then having to debug down to why a signal is 'X'. Try it out.

Most people (myself included) grew up using std_logic because that is what
was taught and the switch to something else on something so basic can be
difficult to 'unlearn' but it is worth it. It's also not a big leap. The
type std_logic is actually derived from std_ulogic, they have all the same
values and everything. The only difference is that a std_logic signal is
allowed to have multiple drivers, std_ulogic is not (which is why the
compiler can flag these as errors). The other way to catch the problem is
to allow the synthesis tool to run all the way through. At 'some' point
every tool is going to complain about two drivers on a net where the drivers
are not tri-stated.

> Also, if something like the following is written,
> which is the correct way to write a register:
>
> MY_PROC : process (clk, rst) is
> begin
> if (rst = '0') then
> a <= '1';
> elsif (clk'event and (clk = '1')) then
> if (b = '0') then
> a <= '1';
> else
> a <= '0';
> end if;
> end if;
> end process MY_PROC;
>
> Whatever is put in for "rst" will be the reset, and if you try to put
> other signals, that you use for something else elsewhere, in the place
> of "rst" in the above, Xilinx will connect them to your input port's
> reset signal and create all sorts of mess.

You have to be very careful about using async resets inside an FPGA. I'm
assuming that what you meant by additional signals is something along the
lines of

MY_PROC : process (clk, rst, xx, yy, zz) is
begin
if ((rst or xx or yy or zz) = '0') then

a <= '1';
elsif (clk'event and (clk = '1')) then

.....

The problem you'll find is that when the term 'rst or xx or yy or zz' gets
computed it can glitch because inside an FPGA remember all logic gets
implemented as lookup tables and pass transistors and you can't count on the
output of such an implementation to not have a glitch. That glitch however
will now propogate through your design and reset either all or some
(depending on how big and long the glitch) of the flip flops in your design.
If you use async resets do not deviate from the typical template and
furthermore you need to insure that the 'rst or xx or yy or zz' is the
output of a flip flop (which will not glitch) and also meets timing
requirements. Google or search on this group to find more on resets.

The other safer way is simply do not use the async reset at all and just use
synchronous resets. It will likely make life much easier and there is
almost never any difference in resource usage or anything. The template
then becomes

MY_PROC : process (clk) is
begin
if rising_edge(clk) then -- Using the 'rising_edge' function is much more
descriptive then clk'event and (clk = '1'))
if ((rst or xx or yy or zz) = '0') then
a <= '1';
else
....

KJ

jame...@yahoo.ca

unread,

Sep 29, 2006, 2:58:34 AM9/29/06

to

Thanks for the extra info, guys. Am now stuck on a ModelSim path
problem that has a plain user (me) stuck dead in his tracks. I am
starting a new thread for that.

Weng Tianxiang

unread,

Oct 1, 2006, 8:35:08 PM10/1/06

to

Hi David,
I never said what I have listed is the best one in performance, but it
really saves me a lot.

I never pay attention to what has already been built in Xilinx ISE.
I never spend time doing unnecessary things.

What I pay most of my attention to is to write right logic that runs
fastest in the market!

Thank you, Xilinx engieers, who provide a really reliable platform to
let me
do my best and make my money and my living.

Thank you.

Weng

Xilinx ISE ver 8.2.02i is optimizing away and removing "redundant" logic - help!

jame...@yahoo.ca

KJ

jame...@yahoo.ca

KJ

Weng Tianxiang

jame...@yahoo.ca

ankyag

Weng Tianxiang

ankyag

David Ashley

KJ

KJ

Weng Tianxiang

KJ

Weng Tianxiang

Weng Tianxiang

David Ashley

jame...@yahoo.ca

KJ

jame...@yahoo.ca

KJ

jame...@yahoo.ca

KJ

David Ashley

jame...@yahoo.ca

jame...@yahoo.ca

KJ

Brian Drummond

jame...@yahoo.ca

jame...@yahoo.ca

jame...@yahoo.ca

KJ

jame...@yahoo.ca

jame...@yahoo.ca

jame...@yahoo.ca

jame...@yahoo.ca

KJ

jame...@yahoo.ca

Martin Thompson

KJ

jame...@yahoo.ca

Weng Tianxiang