The new StarCraft II: Heart of the Swarm is really fun to play.
According to sc2ranks.com, I am currently ranked #4862 in North America and #12192 on Earth.
I hope to get promoted in the Gold league before the season ends. The 2013 Season 2 ends on 2013-05-01. I am currently #8 in my division -- which is Division Hierarch Delta.
The Bioinformatics Adventure
2013-04-20
2013-03-22
Cost of hosting Ray Cloud Browser during 1 month in AWS EC2
I got this question on Twitter: what has your bill totaled since you've started hosting ray viewer?
I guess ray viewer is just another name for Ray Cloud Browser ;-P.
For one month, I expect the costs to be at least 2.1915 $ (365.25/12*24*0.003) for my t1.micro spot instance and 6.4000 $ (64 * 0.10) for my 64 GiB Amazon EBS Standard volume. I/O requests are hard to predict, and I did run some I/O intensive workloads that were unrelated to Ray Cloud Browser during this period on my t1.micro instance. My resources are located in the availability zone us-east-1a.
Here the pricing history for t1.micro spot instances in the us-east-1a availability zone.
Figure 1: Pricing history for t1.micro spot instances in the us-east-1a availability zone. Data are from Amazon Web Services API. The vertical axis is a by-the-hour price.
Using the Cost Allocation Feature, I generated (with a home-baked Ruby script) a pivot table for my operation costs in the cloud during the month of February 2013.
Table 1: Cost Allocation report for the period 2013-02-01 to 2013-02-28.
Table 2: Cost Allocation report for the period 2013-03-01 to 2013-03-22.
I guess ray viewer is just another name for Ray Cloud Browser ;-P.
For one month, I expect the costs to be at least 2.1915 $ (365.25/12*24*0.003) for my t1.micro spot instance and 6.4000 $ (64 * 0.10) for my 64 GiB Amazon EBS Standard volume. I/O requests are hard to predict, and I did run some I/O intensive workloads that were unrelated to Ray Cloud Browser during this period on my t1.micro instance. My resources are located in the availability zone us-east-1a.
Here the pricing history for t1.micro spot instances in the us-east-1a availability zone.
Figure 1: Pricing history for t1.micro spot instances in the us-east-1a availability zone. Data are from Amazon Web Services API. The vertical axis is a by-the-hour price.
Using the Cost Allocation Feature, I generated (with a home-baked Ruby script) a pivot table for my operation costs in the cloud during the month of February 2013.
Table 1: Cost Allocation report for the period 2013-02-01 to 2013-02-28.
| Project=Ray-Cloud-Browser-public-demo | ||||
| Product Code | Usage Type | Units | Usage Quantity | Total Cost ($) |
|---|---|---|---|---|
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0E-8 | 0.0 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.33359259 | 0.0 |
| AmazonEC2 | SpotUsage:t1.micro | instance-hours | 550.00000000 | 1.652045 |
| AmazonEC2 | EBS:VolumeUsage | GB-months | 58.92857279 | 5.893582 |
| AmazonEC2 | EBS:VolumeIOUsage | I/O requests | 66575670.00000000 | 6.656853 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0E-8 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.54077656 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0E-8 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 3.18717216 | 0.383951 |
| AWSDataTransfer | DataTransfer-Regional-Bytes | GB | 0.00054671 | 7.3e-05 |
| Total= | 14.586504000000001 | |||
Table 2: Cost Allocation report for the period 2013-03-01 to 2013-03-22.
| Project=Ray-Cloud-Browser-public-demo | ||||
| Product Code | Usage Type | Units | Usage Quantity | Total Cost ($) |
|---|---|---|---|---|
| AWSDataTransfer | DataTransfer-Regional-Bytes | GB | 0.00050045 | 0.009971 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 5.56271202 | 0.0 |
| AmazonEC2 | SpotUsage:t1.micro | instance-hours | 515.99615014 | 1.681617 |
| AmazonEC2 | EBS:VolumeUsage | GB-months | 50.03225975 | 5.003363 |
| AmazonEC2 | EBS:VolumeIOUsage | I/O requests | 135159802.27951210 | 13.515314 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 10.11928072 | 1.215252 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.28797433 | 0.0 |
| Total= | 21.425517 | |||
2013-02-28
Introducing genome subway maps
It's no secret, data visualization is more appealing than bare tables with floating numbers and integers. And visualization can be dynamic and responsive too, if designed correctly. In November 2012, I started to work on a pet project called Ray Cloud Browser. From the name, you can tell that it's something to browse stuff related to astronomy: rays and clouds. In fact, that's untrue. Ray Cloud Browser is a data browser that can run in the cloud -- an abstraction for virtualized hardware that you pay by the hour. Ray is just the brand name of the products I am working on during my doctoral projects.
Ray Cloud Browser is open source and free software. It's all on github with nice documentation and all that. Anyway, enough with the chitchat.
The first picture I want to share is this view that illustrates repeated regions in a genome. It's very like a subway map, hence the title of this post.
You can visit this subway location by yourself here. The demo is running on a t1.micro spot instance on Amazon EC2.
It's even possible (boom!) to have a menu when navigating this genetic map in the cloud.
The visual landscape of regions that are unique in a genome (or in a metagenome, transcriptome, or whatever -ome you deem the best for you) are more calm and simple, like the one below.
In the scientific literature, repeats are usually described as simple branching points in the string graph. Well, some of them are simple (such as the one below in the picture), but most of them are complex with repeats within repeats (worlds within worlds).
My backlog is almost depleted, meaning that soon Ray Cloud Browser will be full of features.
There is a short guide on how to deploy this super-cool software for your own use.
Ray Cloud Browser is open source and free software. It's all on github with nice documentation and all that. Anyway, enough with the chitchat.
The first picture I want to share is this view that illustrates repeated regions in a genome. It's very like a subway map, hence the title of this post.
You can visit this subway location by yourself here. The demo is running on a t1.micro spot instance on Amazon EC2.
It's even possible (boom!) to have a menu when navigating this genetic map in the cloud.
The visual landscape of regions that are unique in a genome (or in a metagenome, transcriptome, or whatever -ome you deem the best for you) are more calm and simple, like the one below.
In the scientific literature, repeats are usually described as simple branching points in the string graph. Well, some of them are simple (such as the one below in the picture), but most of them are complex with repeats within repeats (worlds within worlds).
My backlog is almost depleted, meaning that soon Ray Cloud Browser will be full of features.
There is a short guide on how to deploy this super-cool software for your own use.
2013-02-25
Building a client for visualizing graphs, in a browser
A graph has a set of vertices and a set of edges. An edge is a relationship between two vertices.
If you take Facebook, the vertices are people and the edges are friendships. If you take two people on Facebook, they are probably connected by just a few links -- like in pretty much every discrete systems known to mankind. A path (like that path between two people) is the second class of interesting objects for visualizing a given system, the first class being the graphs.
With graphs and paths, it is possible to describe numerous discrete systems.
In Ray Cloud Browser -- a graph visualizer for genomics, vocabulary terms were carefully selected. In Ray Cloud Browser, the 4 main object types are maps, sections, regions, and locations. A map is a graph in genomics. The vertices of a map are DNA sequences (like GATTACA), and edges are direct neighbourhood relationships (such as GATTACA -> ATTACAG). A section is really just a bunch of paths in the graph. The paths in the map are called regions. Several locations can be explored in a region.
The geometrical landscape of data in Ray Cloud Browser is quite easy to browse because any map has an index associated to it, and it's the same for sections, regions, and locations. For instance, {"map": 0, "section": 3, "region": 5, "location": 3000} will get you somewhere in a genome.
Mathematically, there is a injection between the set of locations -- that is a 4-tuple containing 4 integers (map, section, region, location) -- and the union of all possible sequences and the set containing only the nil object. For a given sequence, a set of 4-tuples (like those described above) can be obtained.
When the operator is at the end of a given region (for example, a contig), it is insightful to obtain what are the nearby regions in the map. To do so, the web service must have an action to search regions associated to any sequence (remember, sequences are vertices in the map).
This is about to become a reality in Ray Cloud Browser. This is exciting to reach this significant milestone after 4 months of relentless work.
In my backlog, I have only 5 tasks remaining ! Yay !
I will use this new powerful feature to better understand what's going on in various Ray issues.
If you take Facebook, the vertices are people and the edges are friendships. If you take two people on Facebook, they are probably connected by just a few links -- like in pretty much every discrete systems known to mankind. A path (like that path between two people) is the second class of interesting objects for visualizing a given system, the first class being the graphs.
With graphs and paths, it is possible to describe numerous discrete systems.
In Ray Cloud Browser -- a graph visualizer for genomics, vocabulary terms were carefully selected. In Ray Cloud Browser, the 4 main object types are maps, sections, regions, and locations. A map is a graph in genomics. The vertices of a map are DNA sequences (like GATTACA), and edges are direct neighbourhood relationships (such as GATTACA -> ATTACAG). A section is really just a bunch of paths in the graph. The paths in the map are called regions. Several locations can be explored in a region.
The geometrical landscape of data in Ray Cloud Browser is quite easy to browse because any map has an index associated to it, and it's the same for sections, regions, and locations. For instance, {"map": 0, "section": 3, "region": 5, "location": 3000} will get you somewhere in a genome.
Mathematically, there is a injection between the set of locations -- that is a 4-tuple containing 4 integers (map, section, region, location) -- and the union of all possible sequences and the set containing only the nil object. For a given sequence, a set of 4-tuples (like those described above) can be obtained.
When the operator is at the end of a given region (for example, a contig), it is insightful to obtain what are the nearby regions in the map. To do so, the web service must have an action to search regions associated to any sequence (remember, sequences are vertices in the map).
This is about to become a reality in Ray Cloud Browser. This is exciting to reach this significant milestone after 4 months of relentless work.
In my backlog, I have only 5 tasks remaining ! Yay !
I will use this new powerful feature to better understand what's going on in various Ray issues.
- store path data inside Region class (data engine) (20 min)
- push other paths in region list when receiving annotations (data engine) (30 min)
- do readahead for other paths too (data engine) (30 min)
- select region in menu (UI) (30 min)
- paths in other colors (rendering) (30 min)
2013-02-22
Using Cost Allocation Report on Amazon Web Services (AWS)
AWS offers web services like compute instances. Lately, I have been using one cc2.8xlarge instance for 3 hours on a weekly basis to give training sessions. My 14 students connect to orion.cloud.raytrek.com (a canonical name to my AWS instance) during every training session.The instance has one additional 300 GiB EBS volume attached to it so that my students keep their data for the whole duration of the training program.
On AWS, I can tag anything I use: EC2 instances, EC2 EBS volumes, S3 buckets, and so on. A tag is a key and a value (key=value), for example Project=Ray-Cloud-Browser-public-demo. On AWS, it's possible to activate a feature called Cost Allocation Report. This feature deposits detailed usage reports in one S3 bucket that I own. These reports include costs.
I tagged my Cost Allocation Report S3 bucket with Project=Billing to get a grasp on on much it costs to use the Cost Allocation Report feature. The cost of getting my Cost Allocation Report reports is only $ 0.02.
I wrote a Ruby script that generates pivot tables for my projects. Below is my Cost Allocation Report tables with some confidential information redacted. Things under Project=not-classified are things that were not tagged.
| Project=ray-in-cloud-cc2.8xlarge-CLI | ||||
| Product Code | Usage Type | Units | Usage Quantity | Total Cost ($) |
|---|---|---|---|---|
| AWSDataTransfer | DataTransfer-Regional-Bytes | GB | 2.8E-7 | 0.0 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.00006033 | 0.0 |
| AmazonEC2 | SpotUsage:cc2.8xlarge | instance-hours | 10.00000000 | 2.7 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00002029 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00004414 | 5.0e-06 |
| Total= | 2.700005 | |||
| Project=Ray-Cloud-Browser-############## | ||||
| Product Code | Usage Type | Units | Usage Quantity | Total Cost ($) |
| AWSDataTransfer | DataTransfer-Regional-Bytes | GB | 0.00004765 | 6.0e-06 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.00000286 | 0.0 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 8.65691169 | 0.0 |
| AmazonEC2 | SpotUsage:t1.micro | instance-hours | 338.75138122 | 1.017348 |
| AmazonEC2 | EBS:VolumeUsage | GB-months | 36.33402209 | 3.633367 |
| AmazonEC2 | EBS:VolumeIOUsage | I/O requests | 3861215.84788799 | 0.385104 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00000230 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.18180145 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00000500 | 1.0e-06 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.39561271 | 0.047268 |
| Total= | 5.083094 | |||
| Project=formation-#############-bioinformatique-hiver-2013 | ||||
| Product Code | Usage Type | Units | Usage Quantity | Total Cost ($) |
| AWSDataTransfer | DataTransfer-Regional-Bytes | GB | 0.00045277 | 6.0e-05 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 3.90998689 | 0.0 |
| AmazonEC2 | BoxUsage:cc2.8xlarge | instance-hours | 10.00000000 | 24.0 |
| AmazonEC2 | EBS:VolumeUsage | GB-months | 145.47762856 | 14.547622 |
| AmazonEC2 | EBS:VolumeIOUsage | I/O requests | 267824.57214411 | 0.026712 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.10216341 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.22231474 | 0.026562 |
| Total= | 38.600956000000004 | |||
| Project=not-classified | ||||
| Product Code | Usage Type | Units | Usage Quantity | Total Cost ($) |
| AWSDataTransfer | DataTransfer-Regional-Bytes | GB | 0.07435498 | 0.009904 |
| AWSDataTransfer | DataTransfer-Regional-Bytes | GB | 0.00000645 | 1.0e-06 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.00165190 | 0.0 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.04349139 | 0.0 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.04270176 | 0.0 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.04325174 | 0.0 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 5.47499061 | 0.0 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.00000197 | 0.0 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.00018585 | 0.0 |
| AmazonS3 | Requests-Tier1 | HTTP requests | 26.42622951 | 0.001066 |
| AmazonEC2 | BoxUsage:cc2.8xlarge | instance-hours | 1.00000000 | 2.4 |
| AmazonEC2 | SpotUsage:t1.micro | instance-hours | 1.02651934 | 0.003083 |
| AmazonEC2 | SpotUsage:t1.micro | instance-hours | 250.47071823 | 0.752221 |
| AmazonEC2 | EBS:VolumeUsage | GB-months | 28.04440006 | 2.804413 |
| AmazonEC2 | DataProcessing-Bytes | 0.00146411 | 0.01 | |
| AmazonSNS | Requests-Tier1 | HTTP requests | 459.00000000 | 0.0 |
| AmazonEC2 | EBS:VolumeIOUsage | I/O requests | 3051481.50818382 | 0.304344 |
| AmazonEC2 | SpotUsage:cr1.8xlarge | instance-hours | 9.00000000 | 3.09 |
| AmazonEC2 | LoadBalancerUsage | 1.00000000 | 0.03 | |
| AmazonEC2 | SpotUsage:cc2.8xlarge | instance-hours | 10.00000000 | 2.7 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00025260 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00293550 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00102118 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00026375 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.14774928 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 4.3E-7 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00040591 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00054968 | 6.6e-05 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00638785 | 0.000763 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00222216 | 0.000266 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00057394 | 6.9e-05 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.32151280 | 0.038415 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 9.4E-7 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00088329 | 0.000106 |
| AmazonEC2 | SpotUsage:cr1.8xlarge | instance-hours | 3.00000000 | 1.03 |
| AmazonEC2 | SpotUsage:cc2.8xlarge | instance-hours | 1.00000000 | 0.27 |
| Total= | 13.444717 | |||
| Project=Ray-Cloud-Browser-public-demo | ||||
| Product Code | Usage Type | Units | Usage Quantity | Total Cost ($) |
| AWSDataTransfer | DataTransfer-Regional-Bytes | GB | 0.00021335 | 2.8e-05 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.00002007 | 0.0 |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.22613437 | 0.0 |
| AmazonEC2 | SpotUsage:t1.micro | instance-hours | 338.75138122 | 1.017348 |
| AmazonEC2 | EBS:VolumeUsage | GB-months | 36.33402209 | 3.633367 |
| AmazonEC2 | EBS:VolumeIOUsage | I/O requests | 3726756.78467940 | 0.371693 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00001613 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.56092834 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00003510 | 4.0e-06 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 1.22061940 | 0.145841 |
| Total= | 5.168280999999999 | |||
| Project=ray-in-cloud-cc2.8xlarge | ||||
| Product Code | Usage Type | Units | Usage Quantity | Total Cost ($) |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.04343685 | 0.0 |
| AmazonEC2 | SpotUsage:cc2.8xlarge | instance-hours | 10.00000000 | 2.7 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00232737 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00506452 | 0.000605 |
| Total= | 2.7006050000000004 | |||
| Project=Billing | ||||
| Product Code | Usage Type | Units | Usage Quantity | Total Cost ($) |
| AWSDataTransfer | DataTransfer-In-Bytes | GB | 0.00301682 | 0.0 |
| AmazonS3 | Requests-Tier1 | HTTP requests | 221.57377049 | 0.008934 |
| AmazonS3 | Requests-Tier2 | HTTP requests | 362.00000000 | 0.01 |
| AmazonS3 | TimedStorage-ByteHrs | GB | 0.00004882 | 0.01 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00011206 | 0.0 |
| AWSDataTransfer | DataTransfer-Out-Bytes | GB | 0.00024386 | 2.9e-05 |
| Total= | 0.028963000000000003 | |||
| Project=Ray-TestSuite | ||||
| Product Code | Usage Type | Units | Usage Quantity | Total Cost ($) |
| AmazonEC2 | EBS:VolumeUsage | GB-months | 0.01230827 | 0.001231 |
| AmazonEC2 | EBS:VolumeIOUsage | I/O requests | 21529.28710468 | 0.002147 |
| Total= | 0.003378 | |||
| Project=###############-instance-testing | ||||
| Product Code | Usage Type | Units | Usage Quantity | Total Cost ($) |
| AmazonEC2 | BoxUsage:hs1.8xlarge | instance-hours | 1.00000000 | 4.6 |
| Total= | 4.6 | |||
People at Amazon.com, Inc. always say that they are obsessed by theirs customers. I am a happy customer of Amazon Web Services, Inc. (AWS), and I can confirm that AWS is really easy for the customer for many reasons, like the Cost Allocation Report.
Cost Allocation Report is really a feature for the customer that allows a better understanding of costs in the cloud. AWS could have charged a lot for that kind of feature -- banks charge their customers a lot for getting account statements from 3 years ago.
p.s.: I have no financial or commercial links with AWS, I am really just one happy AWS customer. I really think that AWS is giving me a great service for the money I give them. It's a win-win situation.
Big milestone reached for Ray Cloud Browser
It's almost March, and yet another milestone for Ray Cloud Browser was successfully reached this week. The data model of this software is composited of 4 types of objects: a map (a DNA kmer graph with a name), a section (a group of DNA sequences which are called regions), a region (a DNA sequence), and a location (a position in a region).
Ray Cloud Browser is a distributed application: some parts run in your browser, and some other parts run in the cloud (or your other favorite place to host your infrastructure). The client is in Javascript and HTML5 and runs in a web browser. The web service is in C++ and runs atop a web server.
The web services is implemented in C++ and is really efficient. There are 3 file binary file formats (with ASCII version that can be converted). The first is the map, which contains all the k-mers of a sample, their coverage, their parents and their children. Any k-mer can obtained in a logarithmic time using the C++ API of this file format. The second file format is the region file format. It allows the retrieval of parts of any region in constant time (each operation is constant time, fetching N locations of a section will perform O(N) operations obviously). The last format is implements annotations. Annotations allow a reverse search. With annotations, it's easy and fast to get a list of locations (map, section, region, location) for any k-mer. This is necessary to have a rich user experience in the HTML5 client where several regions are to be rendered in the user interface.
For the end user, the starting point is
http://smart.cloud.raytrek.com:55001/client/
The port 55001 is just because I am using IBM SmartCloud. Usually, the port is implicit and it's 80.
Below, the HTTP query is in red and the HTTP response is in blue. In some cases, I truncated the message body.
The HTTP query for the first communication follows.
GET /client/ HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 04:56:58 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Wed, 20 Feb 2013 03:37:53 GMT
ETag: "102a33-9f5-4d61faedea640"
Accept-Ranges: bytes
Content-Length: 2549
Connection: close
Content-Type: text/html; charset=UTF-8
Ray Cloud Browser: interactively skim processed genomics data with energy
(message body is truncated)
This returns a HTML content and the client will fetch all the required Javascript files and so on.
The first HTTP query performed by the client returns the list of maps and associated sections for each map.
GET /server/?tag=RAY_MESSAGE_TAG_GET_MAPS HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 04:54:03 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{"maps": [
{ "name": "Sample 2-3 2013-02-19-1",
"sections": [
{ "name": "contigs" } ,
{ "name": "scaffolds" } ,
{ "name": "seeds" } ,
{ "name": "extensions" }
] },
{ "name": "American eel 2013-01-31-8",
"sections": [
{ "name": "contigs" }
] }
]}
The next query fetches information about a particular map.
GET /server/?tag=RAY_MESSAGE_TAG_GET_MAP_INFORMATION&map=0 HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:00:36 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{
"map": 0,
"kmerLength": 61,
"entries": 177593546
}
GET /server/?tag=RAY_MESSAGE_TAG_GET_REGIONS&map=0§ion=0&first=0&readahead=4096 HTTP/1.1
Host: smart.cloud.raytrek.com:55001
Date: Fri, 22 Feb 2013 05:03:23 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{ "map": 0,
"section": 0,
"count": 31701,
"first": 0,
"readahead": 4096,
"regions": [
{"name":"contig-256000092 485463 nucleotides", "nucleotides":485463},
{"name":"contig-207000075 447363 nucleotides", "nucleotides":447363},
{"name":"contig-255000091 320321 nucleotides", "nucleotides":320321},
{"name":"contig-17 290352 nucleotides", "nucleotides":290352},
{"name":"contig-80 255554 nucleotides", "nucleotides":255554},
{"name":"contig-269000011 233955 nucleotides", "nucleotides":233955},
{"name":"contig-5 207507 nucleotides", "nucleotides":207507},
{"name":"contig-253000001 203979 nucleotides", "nucleotides":203979},
{"name":"contig-24 176868 nucleotides", "nucleotides":176868},
{"name":"contig-51 139462 nucleotides", "nucleotides":139462},
{"name":"contig-79 134613 nucleotides", "nucleotides":134613},
{"name":"contig-93 132985 nucleotides", "nucleotides":132985},
{"name":"contig-105 125302 nucleotides", "nucleotides":125302},
(message body is truncated)
The client can then ask for a bunch of k-mers for a given region.
GET /server/?tag=RAY_MESSAGE_TAG_GET_REGION_KMER_AT_LOCATION&map=0§ion=0®ion=4&location=2000&readahead=512 HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:05:29 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{
"map": 0,
"section": 0,
"region": 4,
"kmerLength": 61,
"location": 2000,
"name":"contig-80 255554 nucleotides",
"nucleotides":255554,
"readahead": 512,
"vertices": [
{"position":1744,"value":"CCGGTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTG"},
{"position":1745,"value":"CGGTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGC"},
{"position":1746,"value":"GGTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCA"},
{"position":1747,"value":"GTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCAT"},
{"position":1748,"value":"TCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATG"},
{"position":1749,"value":"CAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGA"},
{"position":1750,"value":"AAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAA"},
{"position":1751,"value":"AACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAG"},
{"position":1752,"value":"ACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGC"},
{"position":1753,"value":"CGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCG"},
{"position":1754,"value":"GTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGT"},
{"position":1755,"value":"TACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTT"},
{"position":1756,"value":"ACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTTA"},
{"position":1757,"value":"CATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTTAT"},
{"position":1758,"value":"ATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTTATC"},
(message body is truncated)
The two last queries in the HTTP API of Ray Cloud Browser allows the client to get attributes of a k-mer and to get annotations of a k-mers.
GET /server/?tag=RAY_MESSAGE_TAG_GET_KMER_FROM_STORE&map=0&object=CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC&depth=512 HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:07:59 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{
"map": 0,
"object": "CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC",
"vertices": [
{
"value": "CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC",
"coverage": 144,
"parents": ["G"],
"children": ["G"]
},
{
"value": "GCGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATAT",
"coverage": 155,
"parents": ["C", "T"],
"children": ["A", "C"]
},
(message body is truncated)
GET /server/?tag=RAY_MESSAGE_TAG_GET_OBJECT_ANNOTATIONS&map=0&object=CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:10:08 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{
"results": [
{ "object": "CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC",
"annotations": [
{ "type": "LocationAnnotation", "section": 0, "region": 4, "location": 2000 }
]
}]}
The data inside the web service are currently added and managed with RayCloudBrowser-client -- a command-line client that uses the Ray Cloud Browser C++ API. The available commands are:
RayCloudBrowser-client add-map
RayCloudBrowser-client add-section
RayCloudBrowser-client create-map
RayCloudBrowser-client create-map-annotations-with-section
RayCloudBrowser-client create-section
RayCloudBrowser-client describe-configuration
RayCloudBrowser-client describe-json-file
RayCloudBrowser-client describe-map
RayCloudBrowser-client describe-map-annotations
RayCloudBrowser-client describe-map-object
RayCloudBrowser-client describe-map-object-annotations
RayCloudBrowser-client describe-map-objects
RayCloudBrowser-client describe-map-with-region
RayCloudBrowser-client describe-section
Running any of these commands without arguments will give you a help page.
I think this visualization project is exciting and eventually, the command-line client for managing a deployment will be totally replaced by new actions available in the endpoint of the web service, like pushing new maps or new sections.
A really cool feature for the long term vision is to have a web action in the HTTP API of Ray Cloud Browser to allow end users to push their FASTQ sequences directly into the cloud.
Something that I am really proud of with the HTTP API of Ray Cloud Browser is that it abstracts totally how the objects are actually stored by the web service.
For instance, RAY_MESSAGE_TAG_GET_MAP_INFORMATION just tells the endpoint that it's for the map # 0 in the list of maps returned by RAY_MESSAGE_TAG_GET_MAPS.
Right now, the storage engine uses memory-mapped files with O_RDONLY for open(), and PROT_READ and MAP_SHARED for mmap().
Ray Cloud Browser is a distributed application: some parts run in your browser, and some other parts run in the cloud (or your other favorite place to host your infrastructure). The client is in Javascript and HTML5 and runs in a web browser. The web service is in C++ and runs atop a web server.
The web services is implemented in C++ and is really efficient. There are 3 file binary file formats (with ASCII version that can be converted). The first is the map, which contains all the k-mers of a sample, their coverage, their parents and their children. Any k-mer can obtained in a logarithmic time using the C++ API of this file format. The second file format is the region file format. It allows the retrieval of parts of any region in constant time (each operation is constant time, fetching N locations of a section will perform O(N) operations obviously). The last format is implements annotations. Annotations allow a reverse search. With annotations, it's easy and fast to get a list of locations (map, section, region, location) for any k-mer. This is necessary to have a rich user experience in the HTML5 client where several regions are to be rendered in the user interface.
For the end user, the starting point is
http://smart.cloud.raytrek.com:55001/client/
The port 55001 is just because I am using IBM SmartCloud. Usually, the port is implicit and it's 80.
Below, the HTTP query is in red and the HTTP response is in blue. In some cases, I truncated the message body.
The HTTP query for the first communication follows.
GET /client/ HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 04:56:58 GMT
Server: Apache/2.2.3 (CentOS)
Last-Modified: Wed, 20 Feb 2013 03:37:53 GMT
ETag: "102a33-9f5-4d61faedea640"
Accept-Ranges: bytes
Content-Length: 2549
Connection: close
Content-Type: text/html; charset=UTF-8
Ray Cloud Browser: interactively skim processed genomics data with energy
(message body is truncated)
This returns a HTML content and the client will fetch all the required Javascript files and so on.
The first HTTP query performed by the client returns the list of maps and associated sections for each map.
GET /server/?tag=RAY_MESSAGE_TAG_GET_MAPS HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 04:54:03 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{"maps": [
{ "name": "Sample 2-3 2013-02-19-1",
"sections": [
{ "name": "contigs" } ,
{ "name": "scaffolds" } ,
{ "name": "seeds" } ,
{ "name": "extensions" }
] },
{ "name": "American eel 2013-01-31-8",
"sections": [
{ "name": "contigs" }
] }
]}
The next query fetches information about a particular map.
GET /server/?tag=RAY_MESSAGE_TAG_GET_MAP_INFORMATION&map=0 HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:00:36 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{
"map": 0,
"kmerLength": 61,
"entries": 177593546
}
GET /server/?tag=RAY_MESSAGE_TAG_GET_REGIONS&map=0§ion=0&first=0&readahead=4096 HTTP/1.1
Host: smart.cloud.raytrek.com:55001
Date: Fri, 22 Feb 2013 05:03:23 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{ "map": 0,
"section": 0,
"count": 31701,
"first": 0,
"readahead": 4096,
"regions": [
{"name":"contig-256000092 485463 nucleotides", "nucleotides":485463},
{"name":"contig-207000075 447363 nucleotides", "nucleotides":447363},
{"name":"contig-255000091 320321 nucleotides", "nucleotides":320321},
{"name":"contig-17 290352 nucleotides", "nucleotides":290352},
{"name":"contig-80 255554 nucleotides", "nucleotides":255554},
{"name":"contig-269000011 233955 nucleotides", "nucleotides":233955},
{"name":"contig-5 207507 nucleotides", "nucleotides":207507},
{"name":"contig-253000001 203979 nucleotides", "nucleotides":203979},
{"name":"contig-24 176868 nucleotides", "nucleotides":176868},
{"name":"contig-51 139462 nucleotides", "nucleotides":139462},
{"name":"contig-79 134613 nucleotides", "nucleotides":134613},
{"name":"contig-93 132985 nucleotides", "nucleotides":132985},
{"name":"contig-105 125302 nucleotides", "nucleotides":125302},
(message body is truncated)
The client can then ask for a bunch of k-mers for a given region.
GET /server/?tag=RAY_MESSAGE_TAG_GET_REGION_KMER_AT_LOCATION&map=0§ion=0®ion=4&location=2000&readahead=512 HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:05:29 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{
"map": 0,
"section": 0,
"region": 4,
"kmerLength": 61,
"location": 2000,
"name":"contig-80 255554 nucleotides",
"nucleotides":255554,
"readahead": 512,
"vertices": [
{"position":1744,"value":"CCGGTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTG"},
{"position":1745,"value":"CGGTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGC"},
{"position":1746,"value":"GGTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCA"},
{"position":1747,"value":"GTCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCAT"},
{"position":1748,"value":"TCAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATG"},
{"position":1749,"value":"CAAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGA"},
{"position":1750,"value":"AAACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAA"},
{"position":1751,"value":"AACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAG"},
{"position":1752,"value":"ACGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGC"},
{"position":1753,"value":"CGTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCG"},
{"position":1754,"value":"GTACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGT"},
{"position":1755,"value":"TACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTT"},
{"position":1756,"value":"ACATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTTA"},
{"position":1757,"value":"CATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTTAT"},
{"position":1758,"value":"ATAACGAATGGTAGGATACAGGACGTATTTACCTTCACATTTGACTGCATGAAGCGTTATC"},
(message body is truncated)
The two last queries in the HTTP API of Ray Cloud Browser allows the client to get attributes of a k-mer and to get annotations of a k-mers.
GET /server/?tag=RAY_MESSAGE_TAG_GET_KMER_FROM_STORE&map=0&object=CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC&depth=512 HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:07:59 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{
"map": 0,
"object": "CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC",
"vertices": [
{
"value": "CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC",
"coverage": 144,
"parents": ["G"],
"children": ["G"]
},
{
"value": "GCGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATAT",
"coverage": 155,
"parents": ["C", "T"],
"children": ["A", "C"]
},
(message body is truncated)
GET /server/?tag=RAY_MESSAGE_TAG_GET_OBJECT_ANNOTATIONS&map=0&object=CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC HTTP/1.1
Host: smart.cloud.raytrek.com:55001
HTTP/1.1 200 OK
Date: Fri, 22 Feb 2013 05:10:08 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: Ray Cloud Browser by Ray Technologies
Access-Control-Allow-Origin: *
Connection: close
Transfer-Encoding: chunked
Content-Type: application/json
{
"results": [
{ "object": "CGGCGCTTCCCATCACCTTAAGTTATCCAGAGGACATATTTGTGATGGAATCACACATATC",
"annotations": [
{ "type": "LocationAnnotation", "section": 0, "region": 4, "location": 2000 }
]
}]}
The data inside the web service are currently added and managed with RayCloudBrowser-client -- a command-line client that uses the Ray Cloud Browser C++ API. The available commands are:
RayCloudBrowser-client add-map
RayCloudBrowser-client add-section
RayCloudBrowser-client create-map
RayCloudBrowser-client create-map-annotations-with-section
RayCloudBrowser-client create-section
RayCloudBrowser-client describe-configuration
RayCloudBrowser-client describe-json-file
RayCloudBrowser-client describe-map
RayCloudBrowser-client describe-map-annotations
RayCloudBrowser-client describe-map-object
RayCloudBrowser-client describe-map-object-annotations
RayCloudBrowser-client describe-map-objects
RayCloudBrowser-client describe-map-with-region
RayCloudBrowser-client describe-section
Running any of these commands without arguments will give you a help page.
I think this visualization project is exciting and eventually, the command-line client for managing a deployment will be totally replaced by new actions available in the endpoint of the web service, like pushing new maps or new sections.
A really cool feature for the long term vision is to have a web action in the HTTP API of Ray Cloud Browser to allow end users to push their FASTQ sequences directly into the cloud.
Something that I am really proud of with the HTTP API of Ray Cloud Browser is that it abstracts totally how the objects are actually stored by the web service.
For instance, RAY_MESSAGE_TAG_GET_MAP_INFORMATION just tells the endpoint that it's for the map # 0 in the list of maps returned by RAY_MESSAGE_TAG_GET_MAPS.
Right now, the storage engine uses memory-mapped files with O_RDONLY for open(), and PROT_READ and MAP_SHARED for mmap().
2013-02-20
Using canonical names for cloud instances
I am using these public cloud services:
My canonical names:
|
Product |
Service Provider |
|
Amazon Elastic Compute Cloud (EC2) |
Amazon Web Services, Inc. (AWS) |
|
Windows Azure Linux Virtual Machines |
Microsoft Corporation |
|
Rackspace Cloud Servers |
Rackspace, U.S. Inc. |
|
IBM SmartCloud® |
IBM Corporation |
My canonical names:
| Name | Type | Value |
| browser.cloud.raytrek.com. | CNAME | ec2-23-23-55-35.compute-1.amazonaws.com. |
| thor.cloud.raytrek.com. | CNAME | 108-166-117-29.static.cloud-ips.com. |
| smart.cloud.raytrek.com. | CNAME | vhost0147.dc1.on.ca.compute.ihost.com. |
| azure.cloud.raytrek.com. | CNAME | ray-tech.cloudapp.net. |
| plp.cloud.raytrek.com. | CNAME | ec2-54-235-237-179.compute-1.amazonaws.com. |
| orion.cloud.raytrek.com. | CNAME | ec2-54-242-199-219.compute-1.amazonaws.com. |
Subscribe to:
Posts (Atom)




