Samsung is Hiring! 32 positions in California and Texas at the intersection of computer graphics and digital chip design

Yuri Panchul

unread,

May 21, 2026, 12:35:46 AMMay 21

to SystemVerilog Meetups in Silicon Valley

https://verilog-meetup.com/2026/05/20/samsung-is-hiring-32-positions-in-california-and-texas-at-the-intersection-of-computer-graphics-and-digital-chip-design/

The team at Samsung that designs the Xclipse GPU in Galaxy phones with Exynos SoC – is doing expansion. There are 32 jobs available in two main locations:

Samsung Advanced Computing Lab (ACL) in San Jose, California
Samsung Austin Research and Development Center (SARC) in Austin,Texas
There is also an office in San Diego, California

Look at the selection:

Design flow and Methodology, GPU Architecture, Microarchitecture, RTL Design, Design Verification, SoC Architect / Memory Subsystem, Physical Design / Floorplan, Power Management, Post-Silicon Debug & Validation, SoC Performance Architect, Memory Controller Micro-Architect, Research Architect, SoC Architect / Coherent Interconnect, Physical Design Feasibility, GPU Software Architecture, Compilers, Static Timing Analysis CAD Engineer, Physical Implementation CAD Lead, GPU Performance Architect and others.

Position details:

https://semiconductor.samsung.com/about-us/careers/jobs/?filter=%28SemiFilCode%3Abusiness-unitSARC%29&searchvalue=acl&region=us

You can apply directly at the website – or, if you want, I can make you an internal referral since I (Yuri Panchul) am a member of the GPU team. However:

1. I can give a referral for RTL design positions only.

2. Unfortunately, I can do it only for the US Citizens and Green Card Holders (H-1 visas are difficult these days).

3. In order for me to make sure I recommend high-quality candidates to hiring managers, I need you to solve a problem: read my SNUG article and implement Challenge Number 3 with division and flow control. You can contact me on LinkedIn.

https://verilog-meetup.com/2026/03/15/snug-silicon-valley-2026/
https://verilog-meetup.com/wp-content/uploads/2026/03/yuri_panchul_2026_02_03_snug_silicon_valley_paper.pdf
https://www.linkedin.com/in/yuripanchul

4. After you solve the problem, I need to review the solution with you and ask you to solve two or three smaller puzzles in front of me. If you live in Silicon Valley, I can do it on Sunday during the Verilog Meetup from 11 am to 2 pm at Hacker Dojo, 855 Maude Ave, Mountain View, California. If you live somewhere else in the States, we can do it over Zoom.

The link to the Verilog Meetup: https://verilog-meetup.com/about/

5. If I like your solutions, I will enter your data into the Samsung SARC/ACL internal referral website. After that, you will get an email from the website and can start an application to go through the official Samsung interview process. I will also forward your resume to the hiring managers and the team’s recruiters.

6. Note that solving a couple of puzzles for me is not a part of the official Samsung interview process. This is just my personal way to ensure I forward only the relevant resumes to the company.

The picture shows an example of the industry-standard GPU benchmarks we use to test our designs.

Kevin Cameron

unread,

May 25, 2026, 6:28:01 AM (10 days ago) May 25

to SystemVerilog Meetups in Silicon Valley

Claude tells me we have a super fast Verilog simulator running on a RISC-V softcore array, but my Xilinx FPGA is on the small side.

Does anyone want to help me out with programming it into a Stratix 10 (I don't have a license)?

Happy to chat about using any other FPGAs, my kit is a ZCU104.

Kev.

yu...@panchul.com

unread,

May 25, 2026, 10:18:57 AM (10 days ago) May 25

to Kevin Cameron, SystemVerilog Meetups in Silicon Valley

Kevin:

*** Claude tells me we have a super fast Verilog simulator running on a RISC-V softcore array, but my Xilinx FPGA is on the small side. ***

Your statement does not make sense.

There is no way in hell a Verilog simulator running on an array of RISC-V soft cores synthesized on any Xilinx FPGA - could beat any regular Verilog simulator (Synopsys VCS, Cadence Xcelium, Mentor Questa) running a typical Verilog simulation load on a regular x64 computer.

Soft cores on FPGAs generally run on much lower clock frequencies than in ASIC (like 20 MHz versus 2 GHz). It is possible to have an FPGA-optimized core running at 200, maybe 500 MHz, but this would be a simple core with a static pipeline, not a superscalar with high performance. You cannot compensate for this low frequency with a large number of RISC-V cores in the case of Verilog simulation because of the bottleneck of communicating these cores with the shared memory will kill the performance.

Xilinx FPGA-based emulators such as Synopsys ZeBu do _NOT_ achieve their high performance by running a software simulator on an array of RISC-V soft cores synthesized into Xilinx. They work on a completely different principle, but do a specialized synthesis of the given Verilog design without using soft RISC-V cores as an intermediate layer.

Your claim sounds like "Linux kernel can run super fast if we rewrite it in Python". This is just not something connected with reality.

If you need fast software simulator (Verilog, RTL level) and don't have access to commercial tools (Synopsys VCS, Cadence Xcelium or slower Mentor Questa), you can use Verilator (less features, but still good performance) or slower Icarus Verilog (it run ~50 slower than Synopsys VCS but still would beat any "super fast Verilog simulator running on a RISC-V softcore array synthesized in FPGA").

If you want to use a Xilinx board for prototyping, just synthesize your design; there is no point in adding an extra layer (a Verilog simulator running on a RISC-V softcore array synthesized in an FPGA). It is like adding horses to move your car.

*** Does anyone want to help me out with programming it into a Stratix 10 (I don't have a license)? ***

Why do you need synthesis for Altera Stratix? You just said you have Xilinx ZCU104. Stratix is a different and incompatible FPGA family that requires a different toolchain (Quartus instead of Vivado).

Any contractor you can hire would have to buy a $4000-5000 commercial license for Quartus Pro to synthesize for Stratix 10. He can also use a temporary evaluation license or convince Intel that he is doing some important research for the good of Mankind that warrants a free license.

Thank you,
Yuri Panchul

Kevin Cameron

unread,

May 27, 2026, 7:11:03 PM (8 days ago) May 27

to yu...@panchul.com, SystemVerilog Meetups in Silicon Valley

Claude's latest update -

https://kev-cam.github.io/ldx/

Code is under - https://github.com/kev-cam/ldx

My simulator is - https://github.com/kev-cam/nvc - but the overarching project is -

https://github.com/kev-cam/sv2ghdl/blob/main/README.md

Claude Code can probably get it working on any FPGA you have.

Regards,

Kev.

Yuri Panchul

unread,

May 27, 2026, 8:16:21 PM (8 days ago) May 27

to Kevin Cameron, yu...@panchul.com, SystemVerilog Meetups in Silicon Valley

Kevin:

I am not sure what you are trying to communicate. You put links to several projects, including (based on my quick look):

1. A proposal for an equivalent of VHDL resolution function in SystemVerilog. Which mentions analog Verilog-AMS.

2. SystemVerilog to VHDL translator with the claim of improved X-propagation and user-defined nets..

3. A linker that is capable of translating calls from C functions into a C-based design running on an FPGA board.

Can you give a specific real world example where (1) or (2) is useful? I am not aware of such use case based on my MIPS/Juniper/Samsung and other experiences.

As far as (3) concerned, I can see that your c2v.py (C function to synthesizable Verilog) translator can produce only combinational logic. I do not see any value in it because the interface between C program running under Linux on ARM core in Zynq and FPGA logic is going to be the bottleneck and there is no point to spend numerous cycles to run a small combinational function on FPGA fabric, running at much lower clock frequency than the program running on CPU.

If you have a demo of something useful running on a Zynq board, why don't you put some short readable slides outlining the practical usefulness and a video of the design running on a board?

In the repository, there is a mess of files, for example:

../../../arv/RISC-V.srcs/asynchronous/cpu/decoder.vhdl referred from Quartus - where is this file, Why do you need it for your simulation accelerator?

https://github.com/kev-cam/ldx/blob/main/fpga/vexriscv/GenLdxCpu.scala - what is this file doing in the repository? If VexRISCV a part of your acceleration?

Thank you,

Yuri Panchul

--
You received this message because you are subscribed to the Google Groups "SystemVerilog Meetups in Silicon Valley" group.
To unsubscribe from this group and stop receiving emails from it, send an email to meetsv+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/meetsv/CAAuo2DxxWwfe3fA_dPDWc-TRf_oNTF%3DS5%2BJJQv_-pa0OH1MA2g%40mail.gmail.com.

Kevin Cameron

unread,

May 28, 2026, 2:15:52 AM (8 days ago) May 28

to Yuri Panchul, yu...@panchul.com, SystemVerilog Meetups in Silicon Valley

In-line -

On 5/27/2026 5:16 PM, Yuri Panchul wrote:

Kevin:

I am not sure what you are trying to communicate. You put links to several projects, including (based on my quick look):

1. A proposal for an equivalent of VHDL resolution function in SystemVerilog. Which mentions analog Verilog-AMS.

The main project is a simulator that handles VHDL, Verilog-AMS and SystemVerilog by translating all into something NVC can handle with Xyce doing the analog part.

2. SystemVerilog to VHDL translator with the claim of improved X-propagation and user-defined nets..

Yep.

The "3D" logic approach removes the X-propagation problem (it's in the nvc libs). That's a UDN in SV and VHDL; I added the functionality required for describing bidirectional components using it (missing for 40/10+ years).

3. A linker that is capable of translating calls from C functions into a C-based design running on an FPGA board.

ldx is a smart linker that rewrites the code to take advantage of available accelerators, which can be on an FPGA.

Can you give a specific real world example where (1) or (2) is useful? I am not aware of such use case based on my MIPS/Juniper/Samsung and other experiences.

I do DV for things like power electronics and RF, it's useful. O/1/X/Z logic doesn't translate properly on AMS boundaries, and the 3D logic is more efficient than 01XZ in simulation.

As far as (3) concerned, I can see that your c2v.py (C function to synthesizable Verilog) translator can produce only combinational logic. I do not see any value in it because the interface between C program running under Linux on ARM core in Zynq and FPGA logic is going to be the bottleneck and there is no point to spend numerous cycles to run a small combinational function on FPGA fabric, running at much lower clock frequency than the program running on CPU.

It's useful because you can define processor extensions as routines (for RISC-V), that makes code forward and backward compatible. You supply code with the routine in it and ldx will patch the binary to use the accelerated version if it is present. In one case that was a 30 to 1 cycle reduction.

It works better if the FPGA is bolted into a basic RISC-V CPU pipeline, but that hardware currently doesn't exist, so I'm using soft-cores.

There's no reason the translation can't be multi-cycle, we just haven't done that yet.

It works for any ISA.

If you have a demo of something useful running on a Zynq board, why don't you put some short readable slides outlining the practical usefulness and a video of the design running on a board?

In the repository, there is a mess of files, for example:

../../../arv/RISC-V.srcs/asynchronous/cpu/decoder.vhdl referred from Quartus - where is this file, Why do you need it for your simulation accelerator?

https://github.com/kev-cam/ldx/blob/main/fpga/vexriscv/GenLdxCpu.scala - what is this file doing in the repository? If VexRISCV a part of your acceleration?

VexRISCV is the core being used in the processor array. You'll have to ask Claude why files are there, it writes the code and the docs.

The demo code that got uploaded today is an RTL simulation accelerator, offering a 1000x speed-up over Verilator, I think folks will find that useful (if it works as Claude claims).

What other kind of demo would you like? Maybe a scalable GPU?

Regards,
Kev.

Yuri Panchul

unread,

May 28, 2026, 2:37:27 PM (7 days ago) May 28

to Kevin Cameron, yu...@panchul.com, SystemVerilog Meetups in Silicon Valley

Kevin:

*** The demo code that got uploaded today is an RTL simulation accelerator, offering a 1000x speed-up over Verilator, I think folks will find that useful (if it works as Claude claims). What other kind of demo would you like? Maybe a scalable GPU? ***

Let's keep the testing simple. You claim the sandwich of technologies produced by Claude (a SystemVerilog software simulator running on an array of VexRISCV cores synthesized and uploaded in Xilinx UltraScale FPGA) runs a representative test 1000x times faster than Verilator, right? Then it should surely run it even faster than Icarus Verilog, right?

Let's get a small RTL example, for instance 4_2_9_a_plus_b_using_wrapped_fifos from basics-graphics-music repository.

https://github.com/yuri-panchul/basics-graphics-music/blob/main/labs/4_microarchitecture/4_2_fifo/4_2_9_a_plus_b_using_wrapped_fifos

Let's modify the test a little bit: increase the number of transfers from 100 to 10000000 (ten million) and the timeout from 10 thousand to 1 billion. Let's also comment out logging and turn off VCD dump. It runs for 4 minutes 30 seconds (270 seconds) on my old Dell OptiPlex 3050 computer (i5-6500T 2.50GHz ~8GB memory) and for 1 minute 45 seconds (105 seconds) on my Mac Mini M4.

The modified testbench is here:

https://github.com/yuri-panchul/basics-graphics-music/tree/main/labs/4_microarchitecture/4_2_fifo/4_2_9_a_plus_b_using_wrapped_fifos_benchmark

I propose you to run both the original test and the modified test on your Claude-generated thingy on your ZCU104 board. If it produces functionally correct results on the original test and runs the modified test in less than 10 seconds (27x speedup compared to Icarus on Dell and 10x on Mac), and there is no testbench cheating (i.e. displaying the expected output without actually running the test), and you do all of this until 4th of July 2026

- then I will pay you $1000. If not - you pay me $1000. Deal?

Note this is a very generous deal for you. Claude claims 1000x speedup, I want you to prove merely 10x speedup.

Thank you,

Yuri Panchul

Kevin Cameron

unread,

May 28, 2026, 3:04:25 PM (7 days ago) May 28

to Yuri Panchul, yu...@panchul.com, SystemVerilog Meetups in Silicon Valley

I'll try to get that one done today

Later....

Kev.

Reply all

Reply to author

Forward