1. Why can't I use the VOSON System crawler?
When you applied for your VOSON account you were given the opportunity to apply for the "basic access" role and check the box saying that you agree to the terms and conditions for the basic access role (i.e. access to the crawler). If you did not do this during the account application process, you can do this now (via your account management page). Please note: we will not assign you to the basic access role and thus give you access to the crawler until you agree to the terms and conditions (by checking the checkbox).
2. When I click on the VOSON System link (after logging into the VOSON site i.e. not as a guest user), the VOSON System starts up, but I'm still a guest user. I can't use the VOSON System as myself, even though I have a VOSON user account This might happen if you were using the VOSON System as a guest user just before trying to use it as you (i.e. not a guest). What you need to do is make sure you click the "logout" link in the upper-right corner of the VOSON System, before trying to use it as yourself.
3. Is VOSON capable of snowballing the crawls, so that it crawls sites that are linked to by the seed sites, not just the seed sites? VOSON doesn't automatically do snowball crawls because the network would get too big too quickly and a lot of the new sites would be rubbish (e.g. from crawls of adobe.com). So what users can do is a manual snowball approach: - input some seed sites and get them crawled - then use the composition crosstab function to create a subnetwork containing only the seeds plus "important" sites, where important sites are those with a degree of greater than two (or higher if you like) i.e. they connect to two or more seeds (or else have a reciprocal relationship with one seed). These important new sites are likely to be good candidates for new seed sites. You then open up the original voson database and add these new seeds and then let the crawl happen again. You can keep on adding new seeds, in this way, thus doing a (manual) snowball data collection.
4. I keep on getting an alert box saying "Problem. Function -drawWindow- is not known." This might happen if you were using the VOSON System when the server had to be restarted (server restarts only happen very occasionally). The solution here is to logout of the VOSON System (click link in upper-right corner) and completey close Firefox down. Then restart Firefox, login to the VOSON site and click the link to the VOSON System.
5. Is there any way to export higher quality network maps? I'd like to be able to incorporate them into a Powerpoint and the standard export resolution is not high enough. Also, I can't get the networks to render with all the labels intact (not off screen), not overlapping, etc. There are three ways to export/extract VOSON System network maps for inclusion in papers/reports:
(1) take a screenshot and then use cut/paste in an image editor to get just the network map (this will give you the worst quality image);
(2) use the "Download PNG" function in the VOSON map window (better quality, but still not great);
(3) download the "source code" (SVG - scalable vector graphics) for the map (there is a button for this in the map window) and convert the SVG file to your image format (e.g. jpg, png, eps) using image conversion software such as GIMP (Linux only) or Inkscape (Linux and Windows).
Inkscape is particularly useful because you can use from either the command line or via a GUI (what follows assumes you are using Inkscape version 0.46). The advantage of dowloading the SVG source code for the map is that you can edit it and you can choose whatever resolution you want to save your bitmap (jpg, png) in. You can edit the SVG file in a text editor (SVG is XML markup), but most edits you will probably want to make can be done via the image editor.
The two main edits you will want to make before saving as a bitmap (and both of these can be done in Inkscape i.e. you don't need to edit the source by hand) are:
(1) remove the border that VOSON places around the map (you can do this by clicking on the border in Inkscape and selecting Edit->Cut from the menu);
(2) fit the canvas to the image (if you have node labels, you may find they go off the canvas i.e. aren't in the picture) - you do this by selecting (in Inkscape) File->Document Properties and the clicking the "fit page to selection button". Then select File->Export Bitmap and choose the appropriate resolution (the default is 90 dpi, but you may want to increase to say 180 depending on your purpose).
If you want to do the above via the text editor and command line version of Inkscape, you need to do the following (assuming you are using Linux):
(1) on the third line change the width attribute to something that doesn't cutoff the labels;
(2) delete the three lines which are just before the last line of the SVG file;
(3) from the command line run (to save as png): inkscape yourfile.svg -e yourfile.png -b "ffffff" -d 180 In terms of the problem of labels overlapping each other, this can't be fixed with the above approach. The only options are to have a sparser network or perhaps only have labels for "known" sites i.e. sites that are not coded "uknown" for the categorical attribute that is being used to colour the sites in the network map.
6. How can I map the network of pages _within_ a site, rather than _between_ sites? For example, I want to map how Wikipedia pages focused on "social networks" are connected to one another. (version 0.5.15.0) The VOSON System is designed for collecting and analysing hyperlink networks between websites, not within a given website. It _is_ possible, but it requires a few steps. Here is one approach for mapping the connections between Wikipedia pages focused on "social networks":
1. Create a new voson database and select to only collect outbound links (uncheck the "crawl inbound" checkbox). Select the depth of crawl to 1. Uncheck the "parse text content" and "collect favicon.ico" checkboxes.
2. Input the first seed page: http://en.wikipedia.org/wiki/Social_network
3. Go to Preferences->Node pre-process->Database and enter "en.wikipedia.org/wiki/Social_network" in the Preserving text box.
5. Wait for the crawl to run (you will receive an email when it is finished). You will see the voson database has over 200 pages now: many of these will be Wikipedia pages that you want to have shown in your network e.g. http://en.wikipedia.org/wiki/University_of_California_Irvine, http://en.wikipedia.org/wiki/Small_world_experiment, http://en.wikipedia.org/wiki/Facebook, http://en.wikipedia.org/wiki/Collaboration_graph, http://en.wikipedia.org/wiki/Cohesive.
6. You need to then add the Wikipedia pages you want in your network as new seed sites. So, if the five pages listed above were of interest to you, then you would enter them as seeds.
7. Repeat step 3 above i.e. enter "en.wikipedia.org/wiki/University_of_California_Irvine" in the Preserving text box, press enter, then "en.wikipedia.org/wiki/Small_world_experiment" etc. Note: this is time consuming if you have a lot of seeds. In a future version of the VOSON System, we will implement wildcards to improve this process.
8. Re-create the voson-analysis database.
9. Wait for the crawler to run.
10. You will then have a voson-analysis database which shows the connections between the seed pages i.e. Wikipedia pages that are related to "social networks". However, there will be many other pages in there that you probably won't want in your network e.g. non-Wikipedia pages, or Wikipedia pages in other languages. To get a subnetwork containing only the seed pages, use the Crosstabs Composition tool.
7. In version 0.5.17.5 (01July2011) you changed the crawler behaviour so now the crawler will exit when it reaches a particular depth or level within a website (e.g. depth of 2 means internal pages linked to by the seed URL and depth of 3 means internal pages linked to by pages that are linked to by the seed URL). I am wanting to recrawl some seed sites and I'm worried that the hyperlink network will be different just because of the change in the crawler behaviour. What you should do is set the crawler depth to the maximum, and then the crawler will be guided by the other webminer parameters and not this new depth parameter. So, set "depth of crawl (levels)" to 4.