This presentation will summarize preliminary results of research into a quantitative analysis of metadata quality, using DPLA metadata as a test case. Data pre-processing, including converting metadata into a numeric representation of "completeness" and merging with Google Analytics data, will be discussed. Visualizations will be demonstrated, including representations of "usage" drawn from Google Analytics, comparison of metadata characteristics across hubs, and identification of unusual, "outlier" values. Finally, the presentation will outline next steps and initial work on analyzing and visualizing the content of metadata fields, applying natural language processing techniques, and evaluating that content's relationship to Google Analytics usage.
Visual accompanyment to Corey's presentation: http://nbviewer.ipython.org/github/chrpr/dpla-analytics/blob/master/nltk/demo.ipynb
Notes available at https://drive.google.com/open?id=17CV6JHXhArzw6B8WUdAsuUXlFGpiVouSB-1Ttw241H0&authuser=1