David Soergel

Reproducible Research: WorldMake.org

Open Peer Review: OpenReview.net

Metagenomics: RTAX, QIIME

RSS feed

GitHub: davidsoergel

Twitter: @loraxorg

Open Workflows: A Vision for Collaborative Science.

2014 Oct 23

These are slides from my 3-minute lightning talk at the Bay Area Open Access Week 2014 event (#oaw2014sky). I think you can imagine the story from the cartoons alone; please ask for clarification in the comments as needed!

Confirmation Depth as a measure of reproducible scientific research.

2014 Oct 21

In striving for reproducible science, we need to be very clear on what it actually means to reproduce a result. The concept of confirmation depth helps to describe which possible sources of error in the original study have been eliminated in the reproduction.

Scientific experiments are derivation networks.

Rampant software errors undermine scientific results.

2014 Oct 15

An updated version of this post is now a paper at F1000Research.

Errors in scientific results due to software bugs are not limited to a few high-profile cases that lead to retractions and are widely reported. Here we estimate that in fact most scientific results are probably wrong if data have passed through a computer, and that these errors may remain largely undetected. The opportunities for both subtle and profound errors in software and data management are boundless, and yet bafflingly underappreciated.

All data passing through a computer is suspect.

Protest surveillance by encrypting your email.

2013 Aug 06

In response to my recent suggestion that we encrypt as much email as possible (especially since it is finally convenient to use MacGPG with Mail.app on OSX 10.8-- more on that later), a friend asked:

Why encrypt? It just raises your profile. If I've got something to say that really needs to be private, I think email is right out.

Good question, and one that takes some soul-searching.

GPG keys update, 2013

2013 Jun 25

Here are my GPG keys, current as of 2013.

Resolving software component dependencies using compatibility tests

2012 Jun 26

Modern software frequently depends on preexisting components, which in turn have their own dependencies. Managing these dependencies (e.g., automatically downloading the correct set of prerequisites) is a substantial industry that touches nearly every software development effort. In the Java world, Maven is the dominant mechanism; Scala users may use SBT; Perl provides CPAN; other languages have more or less developed systems; and Linux package managers address essentially the same issue. These solutions remain awkward, particularly in the case of conflicting requirements specified by different components. Developers spend untold hours in the dreaded "dependency hell", trying to establish a mutually compatible set of dependencies just to allow a simple program to compile. Some see this problem as hopeless, and provide mechanisms to allow composing arbitrary components while isolating conflicting areas from one another (e.g., OSGi, Maven Shade). That is not always practical, especially for the vast majority of smaller-scale projects where such engineering investments are out of scope. At the same time, I think that we can gain some clarity on dependency management by designing the supporting infrastructure in the right way.

Proposal for a tabless web browser

2012 Apr 09

Browser tabs, bookmarks, and social bookmarking services considered harmful; a unified approach to web browsing is needed.

Notes on Yann LeCun's publishing proposal

2012 Mar 09

These are brainstorms on Yann LeCun's pamphlet, A New Publishing Model in Computer Science.

Update (May 2013) Since this was written, we implemented a lot of it at OpenReview.net, and wrote a fairly comprehensive paper about it.

Visualizing a todo list with Treemaps

2011 Mar 11

I frequently find that I'd like a high-level overview of the tasks on my todo list, but the OmniFocus list views can be unwieldy-- there may be too many items to see them all at once, and I sometimes find it hard to keep track of which items are most time-consuming and/or most urgent (overdue, or flagged).

A "treemap" provides a natural visualization for this kind of data, because it can simultaneously represent hierarchical structure (through the 2d layout) and two continuous variables (tile size and color), and guarantees that everything fits on one screen.

Contract Tests with TestNG: unit testing against interfaces and abstract classes

2011 Aug 22

An aspect of testing Java programs that seems to me fairly neglected is testing conformance to interfaces, and (nearly identically) testing that functionality of abstract classes works properly in all concrete subclasses. Certainly this has been mentioned before, generally under the name Abstract Tests or Contract Tests. Also, the idea seems to me very much in keeping with Behaviour Driven Development (BDD) and the Design by Contract (DbC) philosophy.

TestNG does not explicitly support Contract Tests, as far as I can tell, but it’s fairly easy to make it work using the little trick I describe below.

Thoughts on GPG Key Management

2008 Nov 09

I am retiring my old GPG key because I've lost confidence that the private part is completely secret (i.e., who knows what backups it may be on, and where those have ended up). Also, I've used the passphrase on the old key in other contexts, which is not so good. Of course I want to be more careful with the new private key, in hopes of keeping it trustworthy indefinitely. Hence the following scheme:

GPG keys update, 2008

2008 Nov 09

Here are my GPG keys, current as of 2008.

s3napback: Cycling, Incremental, Compressed, Encrypted Backups to Amazon S3

2008 May 07

In searching for a way to back up one of my Linux boxes to Amazon S3, I was surprised to find that none of the many backup methods and scripts I found on the net did what I wanted, so I wrote yet another one.