CM-1


Please login to download CM-1. Once logged in, you will see a link below.
Provided by:
Jane Hayes, University of Kentucky
 
Description:
This dataset is a modified from NASA Metrics Data Program's CM-1 project. The dataset contains 235 high-level and 220 low-level requirements. The trace for the dataset was manually verified. The "theoretical true trace" (answerset) built for this dataset consisted of 361 correct links. Each of the high- and low-level files contain the text of one requirement element.

The handtrace.txt file is the answerset. It maps high-level requirements to their low-level children. The handtrace.txt file has this format:

%
SRS5.1.1.10 DPUSDS4.4.1.1 DPUSDS5.1.0.2 DPUSDS5.1.2.3
%
SRS5.1.1.11 DPUSDS5.1.0.2 DPUSDS5.1.2.3 DPUSDS5.2.3.7.1 DPUSDS5.3.0.1 DPUSDS5.3.2.1.4
%
SRS5.1.1.12 DPUSDS5.1.4.2
..
..
..
..

where SRS5.1.1.12 is the identifier of a high-level requirement, and DPUSDS5.1.4.2 is the identifier of the only low-level requirement that traces to it. The items are separated by tabs. So, for example, high-level requirement SRS5.1.1.10 has three children requirements: DPUSDS4.4.1.1, DPUSDS5.1.0.2, DPUSDS5.1.2.3.

 
Artifacts:
High-level requirements
Low-level requirements
 
Size: Large
235 high-level requirements, 220 low-level requirements
361 true links in answerset
 
Sample Artifact Element:
"Built-In Test CSC The Built-In Test (BIT) CSC is a Level 2 reuse component from the SSFF and INSTRUMENT Y projects. The detailed design of the BIT CSC follows. The Built-In Tests CSC, identified DPU-BIT, performs the Stage 2 Built-In Tests (BIT). The Stage 2 BIT includes a test of the SCM EDAC circuit, a checksum test on SCM PROM, a MIL-STD-1553B internal BIT, and a memory test of the DCI Data Buffers. The results of the BIT are recorded in the SYS_CNFG_AREA in EEPROM and are also maintained in DRAM. If an error occurs in one of the functions performing the BIT, the test result will be reported as a failure in the test itself." (from DPUSDS5.3.0)
 
Benchmarks:
Sundaram, Hayes, and Dekhtyar [3].
(Note, the following is with no feedback and no filtering. For the results with feedback and/or with filtering, please see [3].)
Method [Precision, Recall]
tf-idf [1.5%, 97.7%]
tf-idf+Thesaurus [1.5%, 97.7%]
Latent Semantic Indexing (100 dimensions) [0.9%, 98.6%]
LSI+Thesaurus (100 dimensions) [0.9%, 98.6%]
LSI (200 dimensions) [0.9%, 98.8%]
LSI+Thesaurus (200 dimensions) [0.9%, 98.8%]
 
Referenced in:
1. Jane Huffman Hayes, Alexander Dekhtyar, Senthil Sundaram, "Measuring the Effectiveness of Retrieval Techniques in Software Engineering," October 2004, (TR422-04).
2. Jane Huffman Hayes, Alexander Dekhtyar, Senthil Sundaram, Sarah Howard, "Helping Analysts Trace Requirements: An Objective Look," in Proceedings of IEEE Requirements Engineering Conference (RE) 2004, Kyoto, Japan, September 2004, pp. 249-261.
3. Senthil Sundaram, Jane Huffman Hayes, Alexander Dekhtyar, "Baselines in Requirements Tracing," Proceedings of Workshop on Predictive Models of Software Engineering (PROMISE), associated with ICSE 2005, St. Louis, MO, May 2005, pp. 12-17.
4. Xuchang Zou, Raffaella Settimi, Jane Cleland-Huang "Improving Automated Requirements Trace Retrieval: A Study of Term-Based Enhancement Methods," Empirical Software Engineering, April 2010, pp. 119-146.
 
Acknowledgment and Disclaimer:
You and your institution are responsible for assuring that any publication including World Wide Web pages developed under or based on NSF support of your project includes an acknowledgment of that support in the following terms: "This material is based upon work supported by the National Science Foundation under Grant No. 0811140."

You and your institution are also responsible for assuring that, in any publication including World Wide Web pages which contains material based on or developed under your award, (other than a scientific article or paper appearing in a scientific, technical, or professional journal) this acknowledgment is accompanied by the following disclaimer: "Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."