CSDT work log by Lily: July 2011

Sunday, July 31, 2011

7/21 - 7/28

Done last week:

Improved progress bar; I think it's as good as it needs to be for now. The black-boxing issue mentioned two weeks ago remains unsolved but different indicators for different processes are now working.
Finished input verification
Added cancel feature
Added save feature
Implemented memory overflow protection
Improved algorithms a little
Started working on an automated option that analyzes several network files at once and compiles a chart of the results

To do:

Finish automated option
Get a working copy to Jeffrey, the linguist
Try out another algorithm optimization I found
Work on the content for a website about networks & fractals

Progress is a little slow this week due to sickness + computer troubles, either of which problems alone would be easily surmountable but which together are a bit much.

Thursday, July 21, 2011

This week I uploaded the new version of the African Fractal applet to the website, and made some edits to the html to make it all run smoothly.

For the network applet, I got a pretty good linearity measure working with the help of Jed.
I also implemented the generation of predefined types of graphs; now I want to add a save feature so users can reuse these graphs.
I added some input verification and made a tiny bit of progress fixing up the progress bar, but both these things need more work.
I've contacted a linguist interested in analyzing networks based on sentence construction, which I'm really excited about!

Besides the unfinished things just mentioned, I also want to work this week on:
experimenting with the algorithms
adding a cancel feature

Wednesday, July 13, 2011

7/6 - 7/13

Tasks accomplished this week:

Multiple networks can be loaded in one applet, to make comparisons easy. Also, many dimension-calculations can be run on each network without losing old results.
Threading is fixed, so gui always remains responsive
Interface in general is (hopefully) cleaner and more intuitive -though it's also more complicated with added functionality
There's a progress bar to show progress of dimension calculation, which can take a while for larger networks.
File uploading is exponentially faster (I discovered that using + concatenation to build a big string from a file is a bad idea - as the result string gets bigger and bigger it takes longer and longer to add the next piece. Using a StringBuilder and .append() is much better. Now files that used to take me almost a minute to read into a string are done in a blink.)

Some screenshots:

What it looks like on startup.

The information about the network (n, k, D) is now contained in the tab for that network. The loading panel is always accessible for loading more, and old networks stay "alive" (you can keep running analyses on them if you like).

Analysis in progress.

Panels with results from dimension calculation display the input used to create them as well as the computed dimension and r, an indication of the deviation of the data from a straight line (and hence sort of the reliability of the dimension value, though it's not really the best way to measure that). The "use D" button averages this value of D with any other computations of it that the user chose to "use" and displays that up above, where it now says "D: uncalculated". This way a user can experiment with different inputs and keep only the results from combinations that turn out to be helpful.

Things to work on:

Figure out a better way (besides "r") to measure how well a line fits the data (working with Jed on this tomorrow)
Add options in the loading panel to generate graphs of predefined types - random, trees, lattices, etc. - so users can compare the analysis of their networks with that of other models with the same n and k (number of nodes and average degree)
Experiment with algorithm speed improvement - look into optimizations and approximations
Make it sounder - make sure users can't overflow the memory and take care of bad input
Let the user cancel computations
Improve the progress indicator - parts of the computation are so black-boxed it's hard to accurately tell the progress without compromising encapsulation. Also if a user runs two different long computations at once they can only see the progress of the most recent one.
Keep looking into applications

Wednesday, July 6, 2011

6/26 - 7/6

Tasks accomplished:

Applet can now read and write graphs in XGMML.
GUI is more interactive - loading graphs and analyzing for dimension can now be done dynamically
I got data for the metabolic network of E. coli and the protein interaction network (PIN) for yeast from http://www-levich.engr.ccny.cuny.edu/~hmakse/soft_data.html and analyzed them, getting very close to the same results as others (3.4 for E. coli and 1.9 for the yeast). My results for E. coli are often a little high, but perhaps I'm not using the identical data (it seems there are several versions of these data - everywhere I look gives a slightly different number of nodes).
Minor improvements to the data displays

Some screenshots of results:


E. coli. The horizontal axis (ignore all coordinate labels for now; this was before I fixed them) represents box-size (maximum path length between any two nodes in a "box", or sub-net), and the vertical axis is the number of boxes needed to cover the network. For a fractal network the graph when scaled logarithmically (like here) should approximate a straight line. Notice the "D: 3.40..." Never before have I been so excited to see that number.


Same chart for the PIN of yeast. D = 1.91 - yay!

Yeast again; this time the vertical axis represents calculated dimension at each stage of the algorithm. As mentioned in my previous post, the values should end up very close to a single value for a fractal network, and here they do.

Things to focus on now:

Further improve gui - draw networks themselves, make the charts more readable and customizable
Look into more applications - find interesting networks to test (if anybody has any ideas, let me know!)