Data Science Glossary 2019-05-10T14:26:31-04:00

Data Science Glossary

Tired of buzzword bingo? We demystify some common data science terms.

  • (Big) Data Management
    We build pipelines from data ingestion to data curation and storage, on premise, in the cloud, or in data centers. This includes strategy around data collection.
  • Data Enrichment
    An organization’s data can be augmented in ways that improve business insight and empower predictive analytics. We use extensive knowledge of open source data to enrich your proprietary data sources.
  • Web Scraping
    Sometimes, the data you need isn’t structured or curated, but it lives on the unstructured web.  We build scripts that crawl the web and scrape data from these pages.
  • Business Intelligence
    Data should be used to derive actionable business intelligence.  Our primary goal is to use data to contribute to business value. This is done through statistical analysis, data visualization/reporting, and machine learning.
  • Statistical Analysis
    This is most often used to gather high-level knowledge of data. This high level knowledge is used to motivate further business intelligence efforts.  Statistical analysis can create actionable business intelligence on its own or in combination with a reporting solution. It is also often considered a necessary ingredient for machine learning.
  • Data Visualization
    Data provides business intelligence, but if a stakeholder cannot understand it, it is difficult to convert that intelligence into business value. Visualization and reporting bridge that gap. This is also necessary when presenting results from statistical analysis or machine learning.
  • Machine Learning
    Programming computers to learn from data in order to perform specific tasks.  Most often, this is some form of prediction or optimization, although it can also be useful for general pattern mining as well.
  • Deep Learning
    Using deep neural networks to perform machine learning. If the situation calls for it, deep learning can outperform classical methods and provide state of the art performance. We find that deep learning is most useful with sequential data, image data, or learning from simulated environments.
  • Natural Language Processing (NLP)
    Much of the world’s data comes in natural language, which is often unstructured. We combine classical methods and modern deep learning to gain actionable insights and predictive analytics with text data in all forms.
  • Pattern Mining
    Although pattern mining is useful for all forms of machine learning, it is most useful in “unsupervised” settings, when data cannot be naturally used for predictive analytics.  It often provides business intelligence on its own, and can be used as a stepping stone to performing predictive analytics.
  • Data Lake vs Data Warehouse

    Where you store your data is dependent on what type of data you have.  A “Data Lake” is used when all you have is raw, unprocessed data that frequently has varying structures that do not have any relations between one another.  A “Data Warehouse” is similar to a Data Lake, but is used to store structured or relational data from many sources, not just one.