Fork me on GitHub

MyMediaLite: Rating data files

News

MyMediaLite 3.11 has been released.


Rating data files

Rating data files have at least three columns: the user ID, the item ID, and the rating value. Valid user and item IDs are strings, and the rating value is a single precision (32 bit, float) floating point number.

Date/time information or numerical timestamps will be used if necessary, and ignored otherwise.

Examples

Tab-separated columns (.tsv)

5951    50      5
5951    223     5
5951    260     5
5951    293     5
5951    356     4
5951    364     3
5951    457     3

Tab-separated columns (.tsv) with non-integer ratings

u5951    i50      5.0
u5951    i223     5.0
u5951    i260     5.0
u5951    i293     4.5
u5951    i356     4.0
u5951    i364     3.5
u5951    i457     3.0

Space-separated columns

5951 50 5
5951 223 5
5951 260 5
5951 293 5
5951 356 4
5951 364 3
5951 457 3

Comma-separated columns (.csv)

5951,50,5
5951,223,5
5951,260,5
5951,293,5
5951,356,4
5951,364,3
5951,457,3

Rating data with timestamps

5951,50,5,978300760
5951,223,5,978302109
5951,260,5,978301968
5951,293,5,978300275
5951,356,4,978824291
5951,364,3,978302268
5951,457,3,978300719

Rating data with dates

5951 50 5 2005-12-04
5951 223 5 2005-12-04
5951 260 5 2005-12-04
5951 293 5 2005-11-27
5951 356 4 2005-11-27
5951 364 3 2005-11-27
5951 457 3 2005-11-27

Rating data with dates and times

5951    50     5    "2009-08-05 00:50:30"
5951    223    5    "2009-08-02 17:19:33"
5951    260    5    "2010-05-04 21:21:03"
5951    293    5    "2009-09-25 05:04:24"
5951    356    4    "2010-06-30 02:07:57"
5951    364    3    "2010-06-11 04:54:41"
5951    457    3    "2010-06-11 14:26:32"

It does not matter whether the date/time information is in quotation marks or not.

Command-line tools

Rating prediction

MyMediaLite's rating prediction tool supports all three variants by default for its --training-data=FILE and --test-data=FILE arguments. If you know that all your ratings are integers between 0 and 255, you can use --rating-type=byte to save memory.

For prediction, if you use --prediction-file=FILE, you will get a tab-separated file with predictions for the test data. With the --prediction-line="FORMAT" you can modify the line format, e.g. if you want the items to be in the first column: --prediction-line="{1},{0},{2}".

Item recommendation

The item recommendation tool also supports this rating data format. By default, every rating is interpreted as positive feedback, the rating value is ignored. You can use the option --rating-threshold=NUM to define the minimum rating value that is to be interpreted as positive feedback. Ratings below this value will be ignored.

Classes

Reading

The classes implementing reading in this kind of rating data are MyMediaLite.IO.RatingData and MyMediaLite.IO.StaticRatingData. The former's Read() method returns a data structure that can be updated/modified, while the latter's result cannot be updated.

If you also want to read in time information, use the static class MyMediaLite.IO.TimedRatingData.

Writing

MyMediaLite.RatingPrediction.Extensions contains an extension method WritePredictions() that lets you write the predictions to a target, either a file or a TextWriter, for example:

    recommender.WritePredictions(ratings, user_mapping, item_mapping, target);
This will use tabs as column separators.

If you want other separators, provide a line format string to the method:

    recommender.WritePredictions(ratings, user_mapping, item_mapping, target, "{0}|{1}|{2}");

ContactFollow us on Twitter