由买买提看人间百态

boards

本页内容为未名空间相应帖子的节选和存档,一周内的贴子最多显示50字,超过一周显示500字 访问原贴
EE版 - OCR job from recruiter - it is interesting but I can't do i (转载)
相关主题
Cleave + SEM Re: 请教:激光检测thin film 厚度?Software/Algorithm Engineer Job Opening in Miami, FL
fine grained怎么理解Senior EE Job Opening
请问a-Si和c-Si是什么意思?湾区 招CPU人才
请问一个石英玻璃片的问题西雅图亚麻硬件工作消息
请教一下做Si的朋友电力电子电机驱动控制器Software/Firmware Staff Engineer
fabless半导体公司的经营模式 (转载)转发招工Optics Manager, Thin Film Engineer
ZTE面试杜克林的题OCR 软件
Senior Controls EngineerOCR job from recruiter - it is interesting but I can't do i (转载)
相关话题的讨论汇总
话题: minimum话题: coarse话题: system话题: tool话题: python
进入EE版参与讨论
1 (共1页)
c***z
发帖数: 6348
1
【 以下文字转载自 DataSciences 讨论区 】
发信人: chaoz (面朝大海,吃碗凉皮), 信区: DataSciences
标 题: OCR job from recruiter - it is interesting but I can't do it, yet
发信站: BBS 未名空间站 (Fri Jun 20 12:25:07 2014, 美东)
If you would be interested please let me know at h******[email protected] as
early as possible.
Job Title: position for Data Scientist for Machine Learning and Natural
Language Processing Experience
Company: BITS
Task 1: Extend NIST Scientific Text Extraction System
Description of Tasks
I. Implement distributed PDF to image conversion subsystem that converts
pages of scientific articles to individual images.
II. Implement distributed optical character recognition-based text
extraction system that extracts text from images of individual pages and
prepares them for further processing by error correcting, machine learning,
and natural language systems.
III. Develop installation scripts for developed system and required tools
to facilitate installation on Linux virtual machines.
IV. Configure Linux virtual machine images with developed system and
necessary software tools and libraries for deployment in a distributed
virtualized system such as cloud computing.
V. Develop system documentation for deployment, maintenance, and
operation.
Deliverables:
The deliverables for the tasks under Task 1 are:
1. PDF to Image Conversion subsystem in Python using ImageMagick to
perform the actual image conversion. Distribution of computation implemented
using Redis and Thoonk to create a distributed job queue system in which a
publisher node enters PDF ids from a fileserver into the job queue for
distributed worker nodes to fetch and convert into images. Images should be
returned to the fileserver as completed work units in zip files containing
the images. Subsystem should be fault tolerant and include the necessary
error handling and logging to disk to allow for uninterrupted operation over
long periods of times. Failure to perform an image conversion should not
prevent the system from continuing nor should information about the failure
be lost.
2. OCR-based Text Extraction subsystem in Python using OCRopus to extract
the text from the image files. Distribution of computation implemented
using Redis and Thoonk to create a distributed job queue system in which a
publisher node enters work unit identifiers (work units are generated during
image conversion) from a fileserver into the job queue for distributed
worker nodes to fetch and process. Extracted text should be added to zip
file-based work unit and sent back to the fileserver. Subsystem should be
fault tolerant and include the necessary error handling and logging to disk
to allow for uninterrupted operation over long periods of times. Failure to
perform a text extraction should not prevent the system from continuing nor
should information about the failure be lost.
3. Command-line installation scripts in Python that make use of existing
packaging and distribution facilities associated with Linux and Python
libraries when available.
4. A Linux Virtual Machine image compatible with existing VMWare based
infrastructure what have been configured for rapid deployment. The VM image
should contain a current patched version of Linux with the developed code
and its prerequisites installed via the installation script previously
developed.
5. System documentation, in Microsoft Word, for the entire Scientific
Text Extraction System. Documentation shall include an overview of the
architecture, data flow, use of and integration with Redis, deployment,
maintenance, and operation of the application.
Task 2: Develop Graphical User Interface for Computational Soft Materials
Workbench for Multiscale Modeling.
I. Working with MML-specified prototype workbench code extend existing C+
+-based GUI to design and implement menu bar items and dialog boxes which
can be connected to MML specified libraries and tools.
Deliverables:
The deliverables for the tasks under Task 2 are:
1. A prototype of workbench that can be used to illustrate key user
interface concepts. Consists of menus for ZENO and help, a toolbar, an
interface for Python, dialogs for Amorphous Builder, Trajectory Analysis
Tool, LAMMPS and GROMACS simulations, Coarse-Mapping Tool, Coarse-Grain
Structure Tool, Coarse-Grained Force Field Assignment, and ZENO.
Task 3: Develop Computational Soft Materials Workbench for Multiscale
Modeling.
I. Develop core application components.
II. Connect GUI to algorithms and tools.
III. Develop visualization of molecular structures.
IV. Implement facilities for reading and writing files in atomistic
formats.
V. Implement classes to interface to algorithms and tools for molecular
modeling.
VI. Create functionality for Molecular Modeling workflows.
VII. Write documentation for workbench system.
Deliverables:
The deliverables for the tasks under Task 3 are:
1. Identified APIs for GUI, data conversion, molecular visualization,
data conversion, and extensibility. Implement class library to integrate
with APIs.
2. Interface classes to connect functionality to menus for ZENO and help,
a toolbar, an interface for Python, and multiple dialogs (Amorphous Builder
, Trajectory Analysis Tool, LAMMPS and GROMACS simulations, Coarse-Mapping
Tool, Coarse-Grain Structure Tool, Coarse-Grained Force Field Assignment,
and ZENO).
3. 3D visualization, rotation, zoom, selection of individual elements,
and display lists of molecular structures. Visualization of grouping of
highlighted elements into coarse grained elements.
4. Functionality to read and write atomistic data in a variety of domain
formats: CML. PDB, XYZ, LAMMPS (Data and Input), GROMACS (Data, Input, and
Trajectory), Coarse Grain (Mapping and Force Field Table).
5. Classes to interface with molecular modeling algorithms (Amorphous
Builder and Coarse-Graining).
and molecular modeling tools (LAMMPS, GROMACS, Coarse-Grained Structure
Building Tool, ZENO, and Trajectory Analysis Tool).
6. Workflow Functionality that supports a variety of molecular
calculations and computations.
7. Documentation of workbench, creation of GUI Help Menus and web pages.
QUALIFICATIONS OF CONTRACTOR KEY PERSONNEL
All contractor personnel working under this task order shall be designated
as Key Personnel. All Contractor Key Personnel working under this task order
must meet the following minimum qualifications.
• Minimum of 5 years of a scripting language, such as Python,
Javascipt, Perl, or PHP
• Minimum of 5 years experience with system languages such as C or
C++
• Minimum of 5 years experience with Agile Methodologies, such as
XP or SCRUM
• Minimum of 5 years experience with a combination of SQL and No-
SQL databases
• Minimum of 5 years experience developing web applications, with
HTML5, CSS3, Javascript, JQuery, Web 2.0 technologies
• Minimum of 3 years experience developing RESTful interfaces
• Minimum of 1 year experience setting up virtual machines and
installing and making Debian packages
• Minimum of 5 years experience in developing graphical user
interfaces
• Minimum of 5 years experience with XML technologies
1 (共1页)
进入EE版参与讨论
相关主题
OCR job from recruiter - it is interesting but I can't do i (转载)请教一下做Si的朋友
OCR job from recruiter - it is interesting but I can't do it, yetfabless半导体公司的经营模式 (转载)
San Diego Medical Startup looking for Senior/Entry Level SZTE面试杜克林的题
关于DARPA的fundingSenior Controls Engineer
Cleave + SEM Re: 请教:激光检测thin film 厚度?Software/Algorithm Engineer Job Opening in Miami, FL
fine grained怎么理解Senior EE Job Opening
请问a-Si和c-Si是什么意思?湾区 招CPU人才
请问一个石英玻璃片的问题西雅图亚麻硬件工作消息
相关话题的讨论汇总
话题: minimum话题: coarse话题: system话题: tool话题: python