Sunday, March 10, 2013

Test-Creep: Selective Test Execution for Node.js

Check out test-creep to run your tests 10x times faster.

They tell us to write tests before, during and after development. But they don't tell us what to do with this forever-running and ever-growing list of tests. If your tests take a double digit number of seconds to execute then you're doing it wrong. Maybe you have already split your tests into fast unit tests that you run all the time, and slow integration tests that you run as needed. Wouldn't it be great to cherry pick those 2-3 integration tests that are relevant to the change you just made and run them? Wouldn't it be great to make unit tests run even faster by executing just those few that are affected by your current work? Test-Creep automatically runs just the subset of tests that are affected by your current work. Best part: This is done with seamlessly Mocha integration so you work as normal.

What is selective test execution?
Selective test execution means running just the relevant subset of your tests instead of all of them. For example, if you have 200 tests, and 10 of them are related to some feature, then if you make a change to this feature you should run only the 10 tests and not the whole 200. test-creep automatically chooses the relevant tests based on istanbul code coverage reports. All this is done for you behind the scenes and you can work normally with just Mocha.

Installation and usage
1. You should use Mocha in your project to run tests. You should use git as a source control.
2. You need to have Mocha installed locally and run it locally rather than globally:

$> npm install mocha
$> ./node_moduels/mocha/bin/mocha ./tests

3. You need to install test-creep:

$> npm install test-creep

4. When you run mocha specify to run the special test 'first.js' before all other tests:

$> ./node_modules/mocha/bin/mocha ./node_modules/test-creep/first.js ./tests

first.js is bundled with test-creep and monkey patchs mocha with the required instrumentation (via istanbul).

In addition, it is recommended to add .testdeps_.json to .gitignore (more on this file below).

How does this work?
The first time you execute the command all tests run. first.js monkey patches mocha with istanbul code coverage and tracks the coverage per test (rather than per the whole process). Based on this information test-creep creates a test dependency file in the root of your project (.testdeps_.json). The file specifies for each test which files it uses:



Next time you run the tests (assuming you add first.js to the command) test-creep runs 'git status' to see which files were added/deleted/modified since last commit. Then test-creep searches the dependency file to see which tesst may be affected and instructs mocha to only run these tests. In the example above, if you have uncommited changes only to lib/exceptions.js, then only the first test will be executed.

At any moment you can run mocha without the 'first.js' parameter in which case all tests and not just relevant ones will run.

When to use test-creep?
test-creep sweet spot is in long running test suites, where it can save many seconds or minutes each time you run tests. If you have a test suite that runs super fast (< 2 seconds) then test-creep will probably add more overhead than help. However whenever tests run for more than that test-creep can save you time.

More information
In github or ask me on twitter.

What's next? get this blog rss updates or register for mail updates!

Monday, October 8, 2012

Crazy Social Analytics for C# Nuget

I'm happy to announce that C# nuget projects get some GitMoon love! This means you can get crazy social analytics about nuget projects. Check out your favorite ones:

SignalR
ServiceStack
Mono.Cecil
Facebook
Hammock
sqlite-net

Or check out famous head to head comparisons:

SignalR vs. ServiceStack
Mono.Cecil vs. LibGit2Sharp
sqlite-net vs. FluentMongo
TweetSharp vs. Facebook
Hammock vs. EasyHttp





What's next? get this blog rss updates or register for mail updates!

Thursday, October 4, 2012

10 Caveats Neo4j users should be familiar with

UPDATE: Michael Hunger from neo4j responds to some of my items in a comment.

Recently I used the Neo4j graph database in GitMoon. I have gathered some of the tricky things I learned the hard way and I recommend any Neo4j user to take a look.

1. Execution guard
By default queries can run forever. This means that if you have accidently (or by purpose) sent the server a long running query with many nested relationships, your CPU may be busy for a while. The solution is to configure neo to terminate all queries longer than some threshold. Here's how you do this:

in neo4j.properties add this:

execution_guard_enabled=true

Then in neo4j-server.properties add this:

org.neo4j.server.webserver.limit.executiontime=20000

where the limit value is in milliseconds, so the above will terminate each query that runs over 20 seconds. Your CPU will thank you for it!


2. ID (may) be volatile
Each node has a unique ID assigned to it by neo. so in your cypher you could do something like:

START n=node(1020) RETURN n

START n=node(*) where ID(n)=1020 return n

where both cyphers will return the same node.

Early on I was tempted to use this ID in urls of my app:

/projects/1020/users


This was very convinient since I did not have a numeric natual key for nodes and I did not want the hassle of encoding strings in urls.

Bad idea. IDs are volatile. In theory, when you restart the db all nodes may be assigned with different IDs. IDs of deleted nodes may be reused for new nodes. In practice, I have not seen this happen, and I believe that with the current neo versions this will never happen. However you should not take it as guaranteed and should always come up with your own unique way to identify nodes.

3. ORDER BY lower case
There is no build in function that allows you to return results ordered by some field in lower case. You have to maintain a shadow field with the lower case values. For example:

RETURN n.name ORDER BY n.name_lower

4. Random rows
There is no built in mechanism to return a random row.

The easiest way is to use a two-phase randomization - first select the COUNT of available rows, then SKIP rows until you get to that row:

START n=node(*)
WHERE n.type='project'
RETURN count(*)

// result is 1000
// now in your app code you make a draw and the random number is 512

START n=node(*)
WHERE n.type='project'
RETURN n
SKIP 512
LIMIT 1

An alternative is to use statistical randomization:

START n=node(*)
WHERE n.type='project' AND ID(n)%20=0
RETURN n
LIMIT 1

Where 20 is number you generated in your code. Of course this will never be fully randomized, and also requires some knowledge on the values distribution, but for many cases this may be good enough.


5. Use the WITH clause when cypher has expensive conditions
Take a look at this cypher:

START n=node(...), u=node(...)
MATCH p = shortestPath( n<-[*..5]-u) WHERE n.forks>20 AND length(p)>2
RETURN n, u, p

Here we will calculate the shortest path for all noes. This is a cpu intensive operation. How about separating concerns like this:

START n=node(...), u=node(...)
WHERE n.forks>20 AND length(p)>2
WITH n as n, u as u
MATCH p = shortestPath( n<-[*..5]-u ) WHERE length(p)>2
RETURN n, u, p

now the path is only calculated on relevant nodes which is much cheaper.


6. Arbitrary depth is evil
Always strive to limit the max depth of queries you perform. Each depth level increases the query complexity:

...
MATCH (n)<-[depends_on*0..4]-(x)
...

7. Safe shutdown on windows
When you run Neo4j on windows in interactive mode (e.g. not a service) do not close the console with the x button. Instead, always use CTRL+C and then wait a few seconds until the db is safety closed and the window disappears. If by mistake you did not safely close it then the next start will be slower (can take a few minutes or more) since neo will do recovery. In that case the log (see #8) will show this message:

INFO: Non clean shutdown detected on log [C:\Program Files (x86)\neo4j-community-1.8.M03\data\graph.db\index\lucene.log.1]. Recovery started ...

8. The log is your best friend
When crap hits the fan always turn out to /data/log. Especially if neo does not start you may find out that you have misconfigured some setting or recovery has started (see #7)

9. Prevent cypher injection
Take a look at this code:

"START n=node(*) WHERE n='"+search+"' RETURN n"

if "search" comes from an interactive user then you can imagine what kind of injections are possible. The correct way is to use cypher parameters which any driver should expose an api for. If you use the awesome node-neo4j api by aseemk you could do it like this:

qry = "START n=node(*) WHERE n={search} RETURN n"
db.query qry, {search: "async"}

10. Where to get help
The Neo4j Google group or the community github project are very friendly and responsive.

What's next? get this blog rss updates or register for mail updates!

Wednesday, October 3, 2012

How I Built GitMoon

I got some queries on how I built GitMoon so I decided to come up with this list in BestVendor:

How I Built a Viral Node.js App in Just One Weekend

You can read there about the technology and tool choices I've made and why. Got a cool cover image too. Check it out in BestVendor.


What's next? get this blog rss updates or register for mail updates!

Tuesday, October 2, 2012

MongoDB and Redis go head to head with Node.js social analytics

And not just mongo vs. redis but also jade vs. ejs, azure vs. jitsu and anything else you want! All in today's GitMoon new rollout. Here are some of the amazing geo-social visualizations you get when you compare two projects:






More visualizations are in GitMoon.

Check out some of the popular head-2-head comparisons:

What's next? get this blog rss updates or register for mail updates!

Thursday, September 27, 2012

GitMoon now has country, company drill down and some amazing graphs too

Check out the great new stuff in GitMoon

Ever since I published GitMoon a couple of months ago I have been getting great feedback in twitter and mails. A lot of you also hinted on what you want to see next. So now is the time to thank you all for the feedback and also to show you what came out of it :)

Here's what has just been deployed to GitMoon:

Friends country / company drill down


Now when you're in the "users" tab you have an option to analyze the project users by country / company / project. So you can answer questions like "how many express.js users are from China" or "How many Yahoo employees use mongoose".

If you take a look at the side map you can also drill down into US states.


Dependency forced-directed graph
This amazing piece of d3.js magic shows you all the dependencies of a project, and their dependencies, all the way up. You can access it via the "projects" tab.


CodeBack drill down
CodeBack is one of gitmoon most useful features. Previously it was just one big list of all usages of the shown project. Now you can filter by the calling project, which makes it very useful for both module authors and consumers.


Amazing new landing page
The landing page is the project face so I decided to give it some SVG love.


Go have fun with your favorite projects!

If you love GitMoon please tweet to @YaronNaveh and your universe.

What's next? get this blog rss updates or register for mail updates!

Tuesday, July 31, 2012

GitMoon is social analytics for Node.js open-source developers


Check out GitMoon - social analytics for Node.js open-source projects!

Open source is fun. Sure, a lot of hard work is involved, but it's great to do something for the community! Now how frustrating is it to publish our project on the wild and never know who is using it (who as in face and picture)? Or to never know how successful our project is? Not to mention seeing how a typical usage looks like so we can improve the next versions.

Embracing GitHub
Github was an amazing game changer here. I love GitHub (Ben and Marc seem to concur) and use it a lot. I also predict great success for github enterprise in this era of IT consumerization: Open source developers are consumers and no old school CIO will tell them which scm to use. And here comes the but: Github is not perfect for the social and analytics needs of the community.

(tl;dr rant) What does it mean to watch a project in github? Does GH watch == Facebook like? Is it "I use it"? It sure is spamming my feed with every check in made to that project. I LIKE node.js but I can't WATCH it. Too much noise. Or let's talk analytics. The GH analytics module is very oriented to give the consumers visibility into how viable and live is this project. This is a great decision supporting tool for them. But let's not forget the project developer! We, developers, want to know who is using us. Who as in name and face. Social, you know. We want to know how successful our project is. How many people use it? How many projects depend on it? If 1 project depends on my project, and 10 projects depend on that one project, and 10 more projects depend on each one of them, then the way I see it the number of projects that depends on me is 1+10+10*10! Moreover, not only 5 users watch my project anymore, but hunderds of users watching every single project in the "network"!

Wouldn't it be cool to have visibility into all this?



So is it an ego thing? While there is nothing wrong with it, you should stay out of the github kitchen if you don't want anyone forking out with "your" stuff. But there's far more than ego here. You want to know how your project is being used so you can decide what's the next steps and next milestone priorities. How about flipping a journal with all the code excerpts that use your code? You'll love CodeBacks:


Ever wondered if your project needs to co-exist with fingernails toejam 2.0? Knowing what other libraries your project users employ together can hint you on your real testing priorities. Meet similar projects:


All this git
GitMoon embraces Github. To start with it uses the github excellent api. The first edition of GitMoon is node.js flavoured, so npm information is also used. Npm is an amazingly simple and gets-the-job-done package manager - you're now able to analyze it.

Not sure where to go next? Try async, mocha, mongoose, azure, ws.js or any of your favorite node projects in GitMoon.

What's next? get this blog rss updates or register for mail updates!