Visipedia
- visipedia.caltech.edu
- Log
- Literature
- Links
- Databases
- Framework/Interface
- Paper Plans
- Technical Details
- Ideas from Lab Meeting
- What Takeshi has done
The Concept of Visipedia
The Visipedia is a visual interface to Wikipedia. Users can query Wikipedia with images, rather than textual keywords. A typical user will, for example, take the picture of a plant they do not know and submit it to the Visipedia. The Visipedia will return a Wikipedia page describing the plant.
The Visipedia also acts as an image organization tool that complements Wikipedia. For example, all images in the Visipedia database are linked directly to Wikipedia articles, or to Visipedia image categories (that in turn are linked to Wikipedia articles). This organization allows users to find all images in the Visipedia database that are related to certain Wikipedia articles.
Main Components
- Users
- Visipedia Web Server (provides user interface, acts as the link between users and the recognition servers)
- Visipedia Database
- Visipedia Images
- Visipedia Categories
- Recognition Servers (does the machine ranking visual search results)
- Human experts
- Amazon Mechanical Turk
- Flickr
- Wikipedia
Component Interaction
- Users submit queries to the Visipedia Web Server, which relays the image query to one or more Recognition Servers.
- The Recognition Server runs the Machine Vision Algorithms to find Visipedia-categories and Wikipedia-articles related to the image query. It sends the results back to the Visipedia Web Server, which displays them to the user.
- Visipedia continuously tries to expand its database with new, annotated images using the help of experts or by querying Flickr and sending the results to the Amazon Mechanical Turk.
- Both the Visipedia Web Server and the Recognition Servers have access to the Visipedia Database (images and their attributes).
Role of Wikipedia
Visipedia acts as a complement to Wikipedia, but will never replace any part of Wikipedia. For example, we will not act as a mirror host of images, we will only provide links to images already on Wikipedia, Flickr, or on the Web. The only exception is if users upload images to our database, but we will strongly encourage them to use Flickr instead, and give us the URL. Also, we will not host any additional information about categories or images (more than simple words and short descriptions). For more information, we will always refer to a Wikipedia article.
Role of Flickr and the Mechanical Turk
There are many photo-sharing sites, but we will initially focus on interactions with Flickr. Thus, we will ignore Google Images and Picasa-Web. This is to make our life easier, and since Takeshi already has all the code for Flickr interaction.
Flickr will mostly be used for expanding the Visipedia database by sending Flickr tag-query results to the Mechanical Turk with the question "Is X in the image?".
Structure of Visipedia Database
Examples / Images
The Visipedia Database is a database of images that are organized into categories and linked to Wikipedia articles.
Each image file has one or more bounding boxes with the following attributes attached:
- Any categories to which the bounding box is related (e.g. "bird", "duck")
- Any Wikipedia articles which are related to what is inside the bounding box (e.g. the article for "Donald Duck")
- Textual description (completely optional).
There will always be at least one bounding box, since if no box was provided, the bounding box will be the whole image frame. For this purpose, there should also be scene-categories (e.g. "sunset").
Note that for an image to be in the Visipedia database, it must have at least one bounding box (may be the whole image frame) with at least one associated category or wikipedia article. (Textual information only is not enough).
Categories
Visipedia will have a separate category system from Wikipedia, since Wikipedia's system isn't complete enough. The categories will form a forest of trees structure. A category can have many sub-categories. Different trees may be, for example, "objects" (e.g. "animals" or "human made"), "scenes", "actions".
Categories have the following attributes attached:
- small textual description of <10 words (if not obvious, like "animals")
- wikipedia links related to that article
Tags are not as good as categories, since they give less information (non-hierarchical), so let's not use them. Orphaned categories are in some way tags, so can be used as such.
Images can only be added to already existing categories. If a suitable category does not exist, it must be created. (So you can't just type random categories, they must be organized into the hierarchy in some way, or start a new category tree).
Query Pool
Submitted image queries that couldn't be classified by the recognition servers, or where the user wasn't satisfied with the result, are placed in the query pool. This is a separate database of unclassified images. Experts can be asked to classify and label the images, after which they are added to the Visipedia database.
This could easily be extended to asking users what they think is in the image, so that the experts can narrow down what type of images they classify/label. For example, if the user knows that the photo is of an old car, he may type "old car" in a box, and experts interested in classic cars may look specifically at those type of images.
Visipedia User Interaction
There are three categories of users interacting with Visipedia:
- Average user: only interested in using Visipedia image query or browsing services, not necessarily contributing to it.
- Human Expert: users willing to lend their time to labeling images and organizing the Visipedia
- Mechanical Turk Users: paid a fee to do some of the human expert tasks (separate interface).
Average User: Querying Process
What kind of queries will the average user ask Visipedia?
- Identification: the user submits a query image and wants Visipedia to identify what is in the image. Result will be in terms of Wikipedia articles with corresponding example images.
- Scene identification (whole image).
- Object identification (ask for bounding box).
- Browsing: The user is reading a Wikipedia article, and wants images/photos related to that article. For example, the user is interested in more photos of Loons.
Interactive Querying / Query Refinement
If the system cannot immediately recognize what is in the image, it should ask the user for more information. For example:
- Basic Questions:
- Object or Scene?
- Provide bounding box.
- Add relevant textual terms.
- 20 questions: Is it similar to any of these images? (give a few example images)
Query Results
Identification: The user receives a list of relevant (ranked) Wikipedia articles with a few example images.
Browsing: The user types the name of a Wikipedia article, and gets a list of images and categories linked to that article.
"Don't know - provide more info" should be a valid response for the Recognition Servers. The result can then be placed in the query pool (see above), where experts can try to classify it "offline".
Expert Interactions
There are two categories of experts:
- Example-providers: an expert that provides more image examples to Visipedia and label them (even if the labeling isn't complete).
- Labelers: categorizes images, adds wikipedia links, labels bounding boxes in images.
Most expert users will probably belong to both of these categories.
Contributing Images to the Visipedia Database
- Give the URL to an image on the web (e.g. Flickr url or Wikipedia image url).
- Upload image from local hard drive.
- Mechanical Turk: Once a Visipedia category gets filled with a few examples, we could search Flickr for that category, and send the results to Mechanical Turk with the question "Select a bounding box for CATEGORY X".
Labeling / Annotation and Organization
We will only have one type of selection tool: bounding boxes with associated Wikipedia articles or Visipedia-categories.
Later we may add other tools, such as visual feature selection or object outline. However, we will start with bounding boxes as a baseline, and use that for comparison when evaluating the usefulness of adding more selection tools.
For the annotations, it may be useful with an indication of how sure the user (expert, mechanical turk user, or software labeler) is when providing the annotation. We can then threshold only "good examples" when training the recognition systems.
More annotations:
- Add some text to an image (optional, but may be useful in some instances).
- Provide license information to an image. (automatic if from Flickr or Wikipedia)
More information that experts can provide:
- Create new categories and subcategories for the category hierarchy.
- Add Wikipedia-links to categories.
- Add small descriptions to categories.
Future Features
This section describes some future features that we would like to add to the system, but that are not essential to get it started. These could be proposed as SURF projects for undergraduates.
Extra annotation tools
In addition to bounding boxes, we may want to add more annotation tools in the future. For example:
- Object Outline (link to category or wikipedia article)
- Whole scene (link to category or wikipedia article)
- Visual feature (e.g. beak, tail)
- Each category has a set of visual features, subcategories share these features. For example, a bird may have a beak and a tail, but some subcategories may have additional features such as a red mark somewhere.
- After a category acquires a few examples of a visual feature, we can send these examples (along with unlabeled images) to amazon turk to get more of these features in other images.
- Actions: Do we need separate Actions? E.g. for animate objects they can walk, move, court etc. They could be attached to categories like the visual features.
- Interestingness: Flickr has something like this. This could influence how high a particular image in the database is ranked when doing similarity searches or browsing the database.
All annotation tools, except for the bounding box, should be added later. We don't know if they will be useful, so we will start with the bounding box as a baseline.
More Search Features
Similarity Search: User is interested in photos that are visually similar to a submitted photo (or a photo in the Visipedia database).
- Scene or object similarity applies here also.
- Not focus on this right now, added as a future feature.
Community and Reputation Systems
Experts could be motivated by getting "points" for the number of images they annotate, and the quality of the annotation. It may also be useful to add other social network features, such as comments, user pages etc, to get experts more involved in Visipedia.
By letting experts rate each other, we could get an idea of how reliable their annotations are. This may be useful for training the visual recognition engines, and for sorting image results.