由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
Quant版 - 数据清理和数据质量控制---大数据时代的挑战之三 (转载)
相关主题
[合集] How to model this problem?(probability)金融大数据API
what is the quality of CUNY's business schoolQuant Developer opportunity
请推荐stock data vendorHiring: Trading Core Developer (C++ Developer) - Akuna Capital
跪求:有啥股票 quality trading方面的资料吗?Hiring: Trading Strategies Developer (C++ Developer) - Akuna Capital
诚心请教两个问题Hiring: Gateway Developer (C++ Developer) - Akuna Capital
自己做了一个股票交易系统Hiring:C++ Developer - Akuna Capital- HFT Trading system development
Hedge Fund Beijing office - Lead Developer position opening.Finance VBA Developer
Quantlib 有必要学吗请教大牛们一个薪水问题
相关话题的讨论汇总
话题: data话题: quality话题: tools话题: 数据话题: discipline
进入Quant版参与讨论
1 (共1页)
l******o
发帖数: 52
1
【 以下文字转载自 DataSciences 讨论区 】
发信人: laihaobo (数海扬帆), 信区: DataSciences
标 题: 数据清理和数据质量控制---大数据时代的挑战之三
发信站: BBS 未名空间站 (Tue Sep 30 09:08:14 2014, 美东)
Or data cleansing, data quality control etc.
Gartner 去年底发表过一个Dara Quality Tools Magic Quadrant 的 report, 对相关
Vendor做了些总结。我不很了解这些Vendor 的选择是否靠谱,但他们对于数据质量控
制的总结还很到位。在数据被大量收集的今天,强调数据清理和数据质量控制,尤为必
要。请记住,"Garbage in, garbage out".
这个Report originally available from http://www.gartner.com/technology/reprints.do?id=1-1LCD5XL&ct=131007&st=sb,
But not any more. 我这里摘一点,同时附上他们现在付费网址,供大家参考,也帮他
们做下广告。
l******o
发帖数: 52
2
http://gtnr.it/1tdIeVw
Magic Quadrant for Data Quality Tools
gartner.comOctober 7
Data quality assurance is a discipline focused on ensuring that data is fit
for use in business processes ranging from core operations to analytics and
decision-making, regulatory compliance, and engagement and interaction with
external entities.
As a discipline, it comprises much more than technology — it also includes
roles and organizational structures, processes for monitoring, measuring,
reporting and remediating data quality issues, and links to broader
information governance activities via data-quality-specific policies.
Given the scale and complexity of the data landscape across organizations of
all sizes and in all industries, tools to help automate key elements of the
discipline continue to attract more interest and to grow in value. As such,
the data quality tools market continues to show substantial growth, while
exhibiting innovation and change.
The data quality tools market includes vendors that offer stand-alone
software products to address the core functional requirements of the
discipline, which are:
Data profiling and data quality measurement: The analysis of data to capture
statistics (metadata) that provide insight into the quality of data and
help to identify data quality issues.
Parsing and standardization: The decomposition of text fields into component
parts and the formatting of values into consistent layouts based on
industry standards, local standards (for example, postal authority standards
for address data), user-defined business rules, and knowledge bases of
values and patterns.
Generalized "cleansing": The modification of data values to meet domain
restrictions, integrity constraints or other business rules that define when
the quality of data is sufficient for an organization.
Matching: Identifying, linking or merging related entries within or across
sets of data.
Monitoring: Deploying controls to ensure that data continues to conform to
business rules that define data quality for the organization.
Enrichment: Enhancing the value of internally-held data by appending related
attributes from external sources (for example, consumer demographic
attributes and geographic descriptors).
In addition, data quality tools provide a range of related functional
abilities that are not unique to this market but that are required to
execute many of the core functions of data quality, or for specific data
quality applications:
Connectivity/adapters: The ability to interact with a range of different
data structure types.
Subject-area-specific support: Standardization capabilities for specific
data subject areas.
International support: The ability to offer relevant data quality operations
on a global basis (such as handling data in multiple languages and writing
systems).
Metadata management: The ability to capture, reconcile and interoperate
metadata related to the data quality process.
Configuration environment: Capabilities for creating, managing and deploying
data quality rules.
Operations and administration: Facilities for supporting, managing and
controlling data quality processes.
Workflow/data quality process support: Processes and user interfaces for
various data quality roles, such as data stewards.
Service enablement: Service-oriented characteristics and support for service
-oriented architecture (SOA) deployments.
The tools provided by vendors in this market are generally consumed by end-
user organizations for internal deployment in their IT infrastructure — to
directly support transactional processes that require data quality
operations and to enable staff in data-quality-oriented roles (such as data
stewards) to engage in data quality improvement work. Off-premises solutions
in the form of hosted data quality offerings, SaaS delivery models and
cloud services continue to evolve and grow in popularity.
g********s
发帖数: 3652
3
SAP有TOOLS专门做这个事情的
l******o
发帖数: 52
4
这个问题应该不是小问题。
1 (共1页)
进入Quant版参与讨论
相关主题
请教大牛们一个薪水问题诚心请教两个问题
转让2010CFA L1资料自己做了一个股票交易系统
求教: how to standardize volatilityHedge Fund Beijing office - Lead Developer position opening.
咋办?才5万4!!!Quantlib 有必要学吗
[合集] How to model this problem?(probability)金融大数据API
what is the quality of CUNY's business schoolQuant Developer opportunity
请推荐stock data vendorHiring: Trading Core Developer (C++ Developer) - Akuna Capital
跪求:有啥股票 quality trading方面的资料吗?Hiring: Trading Strategies Developer (C++ Developer) - Akuna Capital
相关话题的讨论汇总
话题: data话题: quality话题: tools话题: 数据话题: discipline