Search giant Google has recently updated and launched a revamped version of the open source software popular for analyzing, cleaning and in some cases transforming the complete data sets. The latest redesigned version is called as Google Refine. In July, Mountain View California based search giant Google bought a company called as Metaweb that brought and introduced this software initially; this software was originally called as Freebase Gridworks and now Google has given it a totally different name.
What Is Google Refine?
So for all the new comers that don’t know about certain software pertinent to database management, Google Refine is a collection of multitude of tools that could be very useful and handy when wrangling and playing with useful bits of information from a certain data set, more appropriately said it plays with the data that has some data inconsistencies.
For example this amazing Google’s desktop application can find out all the variant spellings of a particular word in a complete data set and replace them with a completely different kind of appropriate terms. This procedure is normally known as normalization and is not novice in the database realm. A developer from Chicago Tribune, Christopher Groskopf commented in a blog post that normalization process usually requires a different algorithm that is completely specific to a certain one data set. But according to Christopher, the Genius work behind Gridworks(now called as Google Refine) is that it has become Generic, means that it can work for a broad type of data sets without the need of writing new kind of codes or different algorithms for every unique data set. Moreover, the best thing is that the final results are very much portable, so it means that the process that is used to clean up 2009’s data can be repeated again for the next year in 2010.
The software also has loads of other tools; it has a complete expression language with the intention to examine any kind of data set. Moreover, filters can be incorporated to isolate a series of subsets of data that can be later on changed or analyzed through a certain set of commands mainly the transform commands.
The Google Refine Software works with any kind of plain text files in which the data can be easily divided into two dissimilar columns by employing commas. So in this case, the results can be exported back out in the JSON (Java Script Object Notion) that can later on be easily and swiftly converted into any other format or in HTML table’s format.
Search giant Google has announced that it has added a series of new features to the Refine software that is now officially being called as Google Refine 2.0. Including the latest option of ability to link records to various other databases, a number of new expressions and transformation commands have been incorporated as well.
The non-profit organization called as ProPublica has used this amazing software in many cases to aggregate data from the several different data sets so that the companies like Pharmaceutical companies.