Poll on your dev environment & queries on dev environment

41 views
Skip to first unread message

Eunice Soh

unread,
Mar 13, 2022, 10:34:42 PM3/13/22
to Dataverse Dev
Hi all,

Would like to poll the community on your dev environment, e.g. what works for you, why the setup (e.g. OS constraints), and/or what you'd recommend.



Here're some options gathered based on the docs https://guides.dataverse.org/en/latest/developers/dev-environment.html. Have added some queries too.

1. netbeans/payara/postgres/solr installed on a local Mac/Linux machine

This seems like the most straightforward e.g. https://guides.dataverse.org/en/latest/developers/dev-environment.html


2. vagrant/payara/postgres/solr installed on a virtual machine, using local Windows machine

This seems like an option for those using Windows e.g. https://guides.dataverse.org/en/latest/developers/windows.html

Q: How's development done here? Using netbeans as well and how?


3. docker, using a local Linux machine

Recently read that Docker can also be used in a dev environment: https://docs.docker.com/desktop/dev-environments/

Q: Can this be used for development, and how?


4. kubernetes, using a local Linux machine

Q: Can this be used for development, and how?



Other questions

1. Which types of tests must be run or good to run prior to pull request on the local dev environment? unit tests, integration tests (API test/ testcontainers test)? Aall units tests or just associated code that was modified? Where should these be run (e.g. docker on a cloud instance, or can it run without docker)?

2. What's run on the continuous integration tests using jenkins and github action e.g. integration tests? The jenkins link seems broken, is it actively being used?


Hope to hear on which dev environments are the best. Thank you!


Kind regards,
Eunice

Oliver Bertuch

unread,
Mar 14, 2022, 3:20:29 AM3/14/22
to datave...@googlegroups.com

Hi Eunice,

I'm using IntelliJ IDEA as my IDE, as I don't like Eclipse nor Netbeans.


1. netbeans/payara/postgres/solr installed on a local Mac/Linux machine

This seems like the most straightforward e.g. https://guides.dataverse.org/en/latest/developers/dev-environment.html

Most devs at IQSS seem to use this option. (as I recently learned, most of it became muscle memory, so it's quick to use for 'em...)


2. vagrant/payara/postgres/solr installed on a virtual machine, using local Windows machine

This seems like an option for those using Windows e.g. https://guides.dataverse.org/en/latest/developers/windows.html

Q: How's development done here? Using netbeans as well and how?

Phil uses this sometimes for ephemeral setups, but it seems to be rarely used for dev environments.

These days, it's not the only option on Windows, though. There is also WSL, which becomes more and more stable

If you are going to do development on Windows, please remember to use the correct line ending setting within your IDE/editors/...!


3. docker, using a local Linux machine

Recently read that Docker can also be used in a dev environment: https://docs.docker.com/desktop/dev-environments/

Q: Can this be used for development, and how?

Not sure DANS is using this for development.


4. kubernetes, using a local Linux machine

Q: Can this be used for development, and how?

It could, but I'd advise not to for now (it's outdated). As the maintainer of this repo, I am trying hard to get away from creating the images in there, so it can focus on Kubernetes usage only.

For easy installations of Docker on Windows, Mac (even Linux) the Minikube project is still a great option to avoid "Docker for Desktop" license costs (although its a great product worth the money).


Another upcoming option is my feature branch at https://github.com/gdcc/dataverse, allowing container usage right from the Maven CLI. Please note this is WIP, so it's not advertised anywhere yet and it may break any time. With the recent pickup of Maven modules within upstream, I am working on making at least the Solr image an upstream thing. (https://github.com/IQSS/dataverse/pull/8320) Documentation for using this stuff will be added to the development guide if it gets merged.

I am always using containers for my dev environment, as it's very fast to setup and run all the things you need in clean, ephemeral ways.


Other questions

1. Which types of tests must be run or good to run prior to pull request on the local dev environment? unit tests, integration tests (API test/ testcontainers test)? Aall units tests or just associated code that was modified? Where should these be run (e.g. docker on a cloud instance, or can it run without docker)?

IMHO the Dataverse codebase suffers from a low test coverage. Many features aren't tested in an automated fashion.

Currently, there are no Testcontainers based tests. Again, I am working on making integration testing with this possible, which is based on the container efforts I am working on. (This can be tried within the https://github.com/poikilotherm/dataverse/tree/testcontainers feature branch. Again this is WIP, expect things to be broken and out of date.)

Please note there is also the docker-aio option to run API tests locally. (I hope to replace this with TC...)


2. What's run on the continuous integration tests using jenkins and github action e.g. integration tests? The jenkins link seems broken, is it actively being used?

What we have in terms of automated tests are JUnit based unit tests (may be executed from the IDE or CLI) and API tests (our best and only kind of integration testing). Unit tests are executed for each Pull Request and have to pass. The API tests are executed on Jenkins setting up EC2 instances to run these tests against, also for every PR. The Jenkins instance is prone to be targeted by bots, which is why it's not really accessible for outsiders. Ask Don for details (here, in Matrix or Community Slack).

My aim is to run API and future integration tests inside Github Actions with Testcontainers, making this stuff more accessible.

If you wanna chat, you're welcome to join us at https://chat.dataverse.org or Community Slack. Phil, Don and I are usually around (beware of timezone mushrooms...).


Best,
Oliver

-- 
-------------------------------------------------------------------------------------
Oliver Bertuch
Forschungszentrum Jülich GmbH
Zentralbibliothek / Central Library
Forschungsdatenmanagement / Research Data Management
Entwicklung von Forschungssoftware / Research Software Engineering

52425 Jülich
+49 2461 61-85370
https://www.fz-juelich.de/zb

Sitz der Gesellschaft: Jülich
Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Astrid Lambrecht,
Prof. Dr. Frauke Melchior
-------------------------------------------------------------------------------------

Eunice Soh

unread,
Mar 14, 2022, 5:22:49 AM3/14/22
to Dataverse Dev

Thank you Oliver for sharing about the dev environment & the testing strategy... This is giving me a better picture now of the dev environment. 


> Another upcoming option is my feature branch at https://github.com/gdcc/dataverse, allowing container usage right from the Maven CLI. 
> Please note this is WIP, so it's not advertised anywhere yet and it may break any time. With the recent pickup of Maven modules within 
> upstream, I am working on making at least the Solr image an upstream thing. (https://github.com/IQSS/dataverse/pull/8320
> Documentation for using this stuff will be added to the development guide if it gets merged.

> I am always using containers for my dev environment, as it's very fast to setup and run all the things you need in clean, ephemeral ways.

I see, could you share more about how to setup and use the "containers for (your) dev environment"? It does seem like a very clean way as compared to option #1.

E.g. are you only using Maven + IntelliJ?  Are you referring to this branch https://github.com/gdcc/dataverse/tree/develop+ct? What kind of containers are you spinning up using Maven, and where are the associated files (e.g. Docker uses Dockerfile)? 



Personally don't have a lot of bandwidth at work, to do the dev work. But would be nice to get hands wet with small changes to the code e.g. tests or small bugs. 

Kind regards,
Eunice

Péter Király

unread,
Mar 14, 2022, 6:48:10 AM3/14/22
to datave...@googlegroups.com
Dear Eunice,

I use IntelliJ and local servers in Linux with some small changes. As
far as I remember the dev scripts had some specific path hardcoded for
e.g. maven, which did not fit my enviroment (which is Ubuntu, while
the official Dataverse suggests Red Hat, and I guess Harvard developer
use that for local development as well.
The dataverse-docker and the dev Docker are two different things. The
first one is for running services (either for testing the service, or
for production), not for development purpose. It uses a particular
stable release, which the dev docker uses the actual state of the code
on your machine.

Best,
Péter
> --
> You received this message because you are subscribed to the Google Groups "Dataverse Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-de...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-dev/1c513552-931b-4793-8685-6fee87051484n%40googlegroups.com.



--
Péter Király
software developer
GWDG, Göttingen - Europeana - eXtensible Catalog - The Code4Lib Journal
http://linkedin.com/in/peterkiraly

Philip Durbin

unread,
Mar 14, 2022, 10:31:02 AM3/14/22
to datave...@googlegroups.com
Hi Eunice,

Thanks for asking!

> 1. netbeans/payara/postgres/solr installed on a local Mac/Linux machine

Like Oliver said, all the devs at IQSS, including myself, use this "install everything directly on a Mac" option. It's what we've always done for 10+ years.


> 2. vagrant/payara/postgres/solr installed on a virtual machine, using local Windows machine

I created the Vagrant environment a long time ago, before Docker existed. I got a new laptop somewhat recently and I don't even have Vagrant (or VirtualBox) installed so I haven't been using it or testing it recently. However, people seem to open issues when it breaks so it's obviously useful to some and we try to fix it when necessary. I think Vagrant is still a decent option for developers on Windows but I'm not sure. Obviously, we should better support developers on Windows. We are open to ideas.


> Q: How's development done here? Using netbeans as well and how?

Yes, I'd suggest using Netbeans. The whole git repo should be mounted under "/dataverse" in the VM.


> 3. docker, using a local Linux machine https://github.com/IQSS/dataverse-docker

dataverse-docker is under the IQSS GitHub org but it's managed and maintained by Slava. I've never used it, but people seem to like it.

When I think about using Dataverse in Docker I always turn to datavese-aio (all in one), which we document here: https://guides.dataverse.org/en/5.9/developers/testing.html#running-the-full-api-test-suite-using-docker

Like the Vagrant environment, we try to keep dataverse-aio working, including when we update dependencies such as Payara, PostgreSQL, or Solr.


> Recently read that Docker can also be used in a dev environment: https://docs.docker.com/desktop/dev-environments/ Q: Can this be used for development, and how?

As with Vagrant, I imagine you edit files using Netbeans or your favorite editor and then deploy into the environment. So sure, I'd say you can use it for dev.


> 4. kubernetes, using a local Linux machine https://github.com/gdcc/dataverse-kubernetes Q: Can this be used for development, and how?

I spun this up once in Germany when I was sitting next to Oliver on a couch. It worked fine. This was using minikube.


> 1. Which types of tests must be run or good to run prior to pull request on the local dev environment? unit tests, integration tests (API test/ testcontainers test)? Aall units tests or just associated code that was modified? Where should these be run (e.g. docker on a cloud instance, or can it run without docker)?

If I'm conscious of the fact that the code I'm editing has tests, I'll run those tests and probably add some. If not, I'll let Jenkins run all the tests and see if anything breaks. If I want to run all the tests locally, I'll use docker-aio.


> 2. What's run on the continuous integration tests using jenkins and github action e.g. integration tests? The jenkins link seems broken, is it actively being used?

Like Oliver said, for security reasons Jenkins is not available to the outside. I would love for it to be open but I understand that for now it can't be. Jenkins runs all the API tests. I think the GitHub Actions are all open so you can see what is run by each of them. One of them makes sure the docs can build (`make html`) for example.

I hope this helps! Please keep the questions coming!

Phil




--

James Myers

unread,
Mar 14, 2022, 10:51:48 AM3/14/22
to datave...@googlegroups.com

FWIW: I use Eclipse on Windows. For deployment, I usually use dataverse-ansible to spin up a test machine and then just repeatedly replace the war file (using ec2 stop/start instance to keep the db state etc. without running the instance all the time). I use Ubuntu on Windows for local tasks and as a way to log into the remote EC2 instances.

 

-- Jim

--

You received this message because you are subscribed to the Google Groups "Dataverse Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-de...@googlegroups.com.

Andreev, Leonid

unread,
Mar 14, 2022, 11:54:21 AM3/14/22
to datave...@googlegroups.com
I do most of my development in a most boring, by the book (by the guide) environment - Netbeans/everything local on MacOS. I spin up EC2 branches when I need to test building everything up from scratch. 

Durand, Gustavo

unread,
Mar 14, 2022, 12:59:43 PM3/14/22
to datave...@googlegroups.com
Same for me as Leonid.

On Mon, Mar 14, 2022 at 11:54 AM Andreev, Leonid <leo...@g.harvard.edu> wrote:
I do most of my development in a most boring, by the book (by the guide) environment - Netbeans/everything local on MacOS. I spin up EC2 branches when I need to test building everything up from scratch. 

On Mon, Mar 14, 2022 at 10:51 AM James Myers <qqm...@hotmail.com> wrote:

--
You received this message because you are subscribed to the Google Groups "Dataverse Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-de...@googlegroups.com.

Jan Mansum, van

unread,
Mar 14, 2022, 1:38:37 PM3/14/22
to Dataverse Dev
2. vagrant/payara/postgres/solr installed on a virtual machine, using local Windows machine

This seems like an option for those using Windows e.g. https://guides.dataverse.org/en/latest/developers/windows.html

Q: How's development done here? Using netbeans as well and how?


The approach we are using at DANS for the development of our data stations is similar to the one above. We are using vagrant as a local copy of our remote test and prod servers. When developing Dataverse code we (re-)deploy the exploded war file via vagrant's shared folder on the VM. We have enabled remote debugging in Payara so it is easy to attach IntelliJ and debug any problems. The vagrant machine also contains our own micro-services, such as external workflow steps, so we can even debug the interaction between Dataverse and the workflow steps. I like this approach because the vagrant machine is basically a copy of our production evironment, it is easy to start from scratch if you have made a mess of your environment and you can save an environment to a vagrant base box, to revive it at a later time.

We have now settled on using IntelliJ for some years. I have used Eclipse before that, but have very limited experience with NetBeans.

Best regards,
Jan


 
Message has been deleted

Eunice Soh

unread,
Mar 16, 2022, 1:51:41 AM3/16/22
to Dataverse Dev
  Thank you all for the inputs, they are very helpful.

Just to summarise, also for reference. Hope it is captured accurately.

1. Testing

Types of test
When should tests be run
  • Tests should be added where appropriate & run when code is changed. 
  • Unit tests should be run minimally during development, and integration tests can also be run locally but if not, on Jenkins (remotely, on pull request). Jenkins pass/fail can be viewed on pull requests. Jenkins logs not publicly viewable.
  • Unit tests (Github Actions) and API integration tests (Jenkins) are also run remotely on pull request.
2. Development environment
notes: Oliver (maintainer) advised this is outdated https://github.com/IQSS/dataverse-kubernetes. Phil once used minikube for this.

Philip Durbin

unread,
Mar 16, 2022, 10:31:23 AM3/16/22
to datave...@googlegroups.com
Thanks, Eunice. Nice summary.

The only thing I'd like to clarify is that developers can certainly run API tests locally without using docker-aio. Because it takes so long to run the entire API test suite (Jenkins spends about 8 minutes on this, I believe), I like running them all in docker-aio because it runs on a different port which leaves my regular environment free for other hacking. If I'm just executing the tests in a single file or a single method, I just use my regular development environment. Note that dev environments need to be configured a certain way for the tests to run: https://guides.dataverse.org/en/5.9/developers/testing.html#getting-set-up-to-run-rest-assured-tests . In practice, rather than following all those manual steps, I use a "dev rebuild" script that gets my dev environment ready to run API tests: https://guides.dataverse.org/en/5.9/developers/troubleshooting.html#rebuilding-your-dev-environment

--
You received this message because you are subscribed to the Google Groups "Dataverse Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dataverse-de...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages